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SCALABLE ARCHITECTURE FOR HIGH 
DENSITY CPLD'S HAVING TWO-LEVEL 
HIERARCHY OF ROUTING RESOURCES 

This application is a Div. of Ser. No. 09/326,940, filed 
Jim. 6, 1999, now U.S. Pat. No. 6,184,713. 

BACKGROUND 

1. Field of Invention 

The invention is generally directed to monolithic inte- 
grated circuits, and more specifically to a scalable architec- 
ture for use within Programmable Logic Devices (PLD's). It 
is even more specifically directed to a subclass of PLD's 
known as High-Density Complex Programmable Logic 
Devices (HCPLD's). 

2. Cross Reference to Related Patents 

The disclosures of the following U.S. patents are incor- 
porated herein by reference: 

(A) U.S. Pat. No. 5,764,078 issued Jun. 9, 1998 to Om 
Agrawal et al, and entitled, FAMILY OF MULTIPLE 
SEGMENTED PROGRAMMABLE LOGIC 
BLOCKS INTERCONNECTED BY A HIGH SPEED 
CENTRALIZED SWITCH MATRIX; 

(B) U.S. Pat. No. 5,811,986 issued Sep. 22, 1998 to Om 
Agrawal et al, and entitled, FLEXIBLE 
SYNCHRONOUS/ASYNCHRONOUS CELL 
STRUCTURE FOR HIGH DENSITY PROGRAM- 
MABLE LOGIC DEVICE; 

(C) U.S. Pat. No. 5,818,254 issued Oct. 6, 1998 to Om 
Agrawal et al, and entitled, MULTI-TIERED HIER- 
ARCHICAL HIGH SPEED SWITCH MATRIX 
STRUCTURE FOR VERY HIGH DENSITY COM- 
PLEX PROGRAMMABLE LOGIC DEVICES; 

(D) U.S. Pat. No. 5,789,939 issued Aug. 4, 1998 to Om 
Agrawal et al, and entitled, METHOD FOR PROVID- 
ING A PLURALITY OF HIERARCHICAL SIGNAL 
PATHS IN A VERY HIGH DENSITY PROGRAM- 
MABLE LOGIC DEVICE; 

(E) U.S. Pat. No. 5,621,650 issued Apr. 15, 1997 to Om 
Agrawal et al, and entitled, PROGRAMMABLE 
LOGIC DEVICE WITH INTERNAL TIME- 
CONSTANT MULTIPLEXING OF SIGNALS FROM 
EXTERNAL INTERCONNECT BUSES; and 

(F) U.S. Pat. No. 5,185,706 issued Feb. 9, 1993 to Om 
Agrawal et al. 

3. Description of Related Art 

Field-Programmable Logic Devices (FPLD's) have con- 
tinuously evolved to better serve the unique needs of dif- 
ferent end-users. From the time of introduction of simple 
PLD's such as the Advanced Micro Devices 22V10™ 
Programmable Array Logic device (PAL), the art has 
branched out in several different directions. 

One evolutionary branch of FPLD's has grown along a 
paradigm known as Field Programmable Gate Arrays or 
FPGA's. Examples of such devices include the XC2000™ 
and XC3000™ families of FPGA devices introduced by 
Xilinx, Inc. of San Jose, Calif. The architectures of these 
devices are exemplified in U.S. Pat. Nos. 4,642,487; 4,706, 
216; 4,713,557; and 4,758,985; each of which is originally 
assigned to Xilinx, Inc. 

An FPGA may be generally characterized as a monolithic, 
integrated circuit that has an array of user-programmable, 
lookup tables (LUT's) that can each implement any Boolean 
function to the extent allowed by the address space of the 
LUT. User-programmable interconnect is typically provided 
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for interconnecting primitive, LUT-implemented functions 
and for thereby defining more complex functions. 

Because LUT-based function implementation tends to be 
functionally more exhaustive (broader) but speed-wise 

5 slower than gate-based (e.g., AND/O R-based) function 
implementation, FPGA's are generally recognized in the art 
as having a relatively expansive capability of implementing 
a wide variety of functions (broad functionality) but at 
relatively slow speed. Also, because length of signal rout- 

1Q ings through the programmable interconnect of an FPGA 
can vary significantly, FPGA's are generally recognized as 
providing relatively inconsistent signal delays whose values 
can vary substantially depending on how partitioning, place- 
ment and routing software configures the FPGA 

A second evolutionary chain in the art of field program- 

15 mable logic has branched out along a paradigm known as 
Complex PLD's or CPLD's. This paradigm is characterized 
by devices such as the Vantis (subsidiary of Advanced Micro 
Devices Inc.) M ACH™ family. Examples of CPLD circuitry 
are seen in U.S. Pat. No. 5,015,884 (issued May 14, 1991 to 

20 Om P. Agrawal et al.) and U.S. Pat. No. 5,151,623 (issued 
Sep. 29, 1992 to Om P. Agrawal et al.) as well as in other 
CPLD patents cited above. 

A CPLD device can be characterized as a monolithic, 
integrated circuit (IC) that has four major features as fol- 

25 lows. 

(1) A user-accessible, configuration-defining memory 
means, such as EPROM, EEPROM, anti-fused, fused, 
SRAM, or other, is provided in the CPLD device so as to be 
at least once-programmable by device users for defining 

30 user-provided configuration instructions. Static Random 
Access Memory or SRAM is of course, a form of repro- 
grammable memory that can be differently programmed 
many times. Electrically Erasable and reprogrammable 
ROM or EEPROM is an example of nonvolatile reprogram- 

35 mable memory. The configuration-defining memory of a 
CPLD device can be formed of a mixture of different kinds 
of memory elements if desired (e.g., SRAM and EEPROM). 
Typically it is of the nonvolatile, In-System reprogrammable 
(ISP) kind such as EEPROM. 

40 (2) Input/Output means (IO's) are provided for intercon- 
necting internal circuit components of the CPLD device with 
external circuitry. The IO's may have fixed configurations or 
they may include configurable features such as variable 
slew-output drivers whose characteristics may be fine tuned 

45 in accordance with user-provided configuration instructions 
stored in the configuration-defining memory means. 

(3) Programmable Logic Blocks (PLB's) are provided for 
carrying out user-programmed logic functions as defined by 
user-provided configuration instructions stored in the 

50 configuration-defining memory means. Typically, each of 
the many PLB's of a CPLD has at least a Boolean sum-of- 
products generating circuit (e.g., and AND/OR array) or a 
Boolean product-of-sums generating circuit (e.g., and 
OR/AND array) that is user-configurable to define a desired 

55 Boolean function, — to the extent allowed by the number of 
product terms (PTs) or sum terms that are combinable by 
that circuit. 

Each PLB may have other resources such as input signal 
pre-processing resources and output signal post-processing 

60 resources. The output signal post-processing resources may 
include result storing and/or timing adjustment resources 
such as clock-synchronized registers. Although the term 
'PLB' was adopted by early pioneers of CPLD technology, 
it is not uncommon to see other names being given to the 

65 repeated portion of the CPLD that carries out user- 
programmed logic functions and timing adjustments to the 
resultant function signals. 
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(4) An interconnect network is generally provided for In the partitioning phase, an original circuit design (which 

carrying signal traffic within the CPLD between various is usually relatively large and complex) is divided into 

PLB's and/or between various IO's and/or between various smaller chunks, where each chunk is made sufficiently small 

IO's and PLB's. At least part of the interconnect network is to be implemented within a single PLB (where such a PLB 

typically configurable so as to allow for programmably- 5 typically includes one or more AND/OR arrays). During the 

defined routing of signals between various PLB's and/or partitioning phase, the resources of each PLB remain as 

IO's in accordance with user-defined routing instructions those of a yet-unspecified one of the many PLB's that are 

stored in the configuration-defining memory means. Another available in the yet-unprogrammed CPLD device. It is 

part of the interconnect network may be hard wired or during placement that physical locations and groupings are 

nonconfigurable such that it does not allow for programmed 10 assigned to partitioned chunks. 

definition of the path to be taken by respective signals Differently designed CPLD's can have differently 

traveling along such hard wired interconnect. designed PLB's with respectively different, logic- 

In contrast to LUT-based FPGA's, gate-based CPLD's are implementing capabilities and timing capabilities. As such, 

generally recognized in the art as having a relatively less- the maximum size that will be allowed for partitioned chunk 

expansive capability of implementing a wide variety of 15 can vary in accordance with the specific CPLD device that 

functions (in other words, not being able to implement all is designated to implement the original circuit design. 

Boolean functions for a given input space) but being able to By way of example, each PLB of a given, first CPLD 

do so at relatively higher speeds. In other words, very wide architecture may be able to generate in one pass (where the 

functionality is sacrificed to obtain shorter, pin-to-pin signal one pass does not include the use of a feedback loop) a 

delays. Also, because length of signal routings through the 20 sum-of-products (SoP) function signal of the expressive 

programmable interconnect of a CPLD is often arranged so form: 
it will not vary significantly despite different signal routings, 

CPLD's are generally recognized as being able to provide f SoP ~x«(Ffi Ki ' Kma * /L ) {Exp. A}, 

relatively consistent signal delays whose values do not vary , n ^ sumK)f oducts expression (Exp. A), the N factor 

substantiaUy based on how partotioning, placement and 25 repr esents a maximum number of product terms (PT's) that 

routing software configures the CPLD. Many devices in the can be ted afld lhereafler summed b a Uve 

Vantis MACH™ family provide such a consistent signal pLR for defini ^ Qne sum . of . products ^lion signal, 

delay characteristic under the Vante trade name of Speed- f ^ ^ ^ nts ^ the ^ expression , 

Lacking™ The more generic term Speed-Cons^tency will m ^ mU m number of independent, PLB input 

be used interchangeably herein with the term, SpeedLock- 30 ^ ^ caQ ^ acquifed from a ^ L lines. 

m ^ * , , . . . . . , . ... e Ki is the number of actual signals that are used as a subset 

™ ^ eV01 - "? sub - branc I 5.°/ £ e ^ 0W1 "8 of of Kmax for defining a corresponding, i-th product term, 

CPLD devices >s known as High-Density Complex Pro- m ^ e M ^ of M si b afe ^ 

grammable Lope Devices (HCPLD s). Tms > sub-branch fa me ^ PLB to define each resp ective, i-th product 

may be generaUy characterized as monolithic IC s that have 35 tenn (pTi) , f ^ ^ and (hat m therefore ^ 

large numbers of I/O terminals (e.g topuUOu^ut pins) in M comribute t0 the j^,^ sum . 

the range of about 64 or more (e.g., 96 128, 192, 256, 320, „ of a mQre concrete e , considef a pLB of 

etc.) and/or have large numbers of lesult-stonng m^acro^Us a iveQ fifst cpLD architechlre where each ^^.^cls 

m the range of about 256 or more (e.g., 32<K512, 1024, etc.). c • have g maximum of 3 pp with each pj bein , 

j. P^ 55 ° f ~ noe f atin g la |8 e . numbere 40 product of no more than 16 input terms, where the input 

and/or large numbers of macrocells into a single CPLD IC f ems are Jed from M ^ lines Such a pLB * 

raises new challenges for achieving relatively broad merefore ^ ^ , o ^ ^ Q J a fiRt Sop 

functionality, high speed, and Speed-Consistency . t . 
to «i i • tm\ • u c c *j *** c c or the expressive torm: 
(SpeedLocking™) m the face of wide vaneties of configu- 
ration software. 45 fsen*\PTi Kin *-mKifmJ*~ Line *) {Exp. Al} 

Configuration software can produce different results, 

good or bad, depending in part on what broadness of Consider also, for purposes of contrast, a PLB of a given 

functionalities, what routing flexibilities and what timing second, and differently designed, CPLD architecture where 

flexibilities are provided by the architecture of the target „ each sum-of-products can have a maximum of 4 PT's, with 

CPLD. Modern CPLD's typically offer a large spectrum of 50 each PT being a product of no more than 32 input terms, 

user-configurable options with respect to how each of many where ^P* 1 terms m sampled from 96 nearby lines. 

PLB's can be configured, how each of many interconnect Such a PLB ma y therefore be able to generate in one pass, 

resources can be configured, and how each of many IO's can a second SoP function of the expressive form: 

be used and/or configured. Rather than determining with {m*^ 7 m "'^ °* ' {Exp A2} 

pencil and paper how each of the configurable resources of 55 wz " ' ' 

a CPLD should be programmed, it is common practice to In other words, due to architectural constraints, it is possible 

employ a computer and appropriate CPLD-configuring soft- that the one-pass, sum-of-products result (f Soi >i= 3 PT 1 +PT 2 + 

ware to automatically generate the configuration instruction PT 3 , see Exp. Al) of a PLB in the first CPLD architecture 

signals that will be supplied to, and that will cause an can be no more complex than a sum of three independent 

unprogrammed CPLD to implement a specific design. 60 product terms (3 PT's), where each such PTi is no more 

CPLD-configuring software typically cycles through a complex than a product of no more than sixteen (16) 

series of phases that are referred to commonly as 'fitting*. independent, PLB term input signals that are sampled out of 

These phases may also be referred to as 'partitioning*, an available and larger set of sixty-four (64) independent 

'placement' , and 'routing*. The fitting software is sometimes signals. 

referred to as a 'place and route' program. Alternate names 65 In contrast, and again due to architectural variations, the 

of software tools that operate at a more global level may one-pass, sum-of-products result (fy 0 p 2 -PT 1 +PT2+PT 3 + 

include, 'synthesis, mapping and optimization tools'. PT 4 , see Exp. A2) of a PLB in the second CPLD architecture 
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can be as complex as a sum of four independent product specified in terms of a Hardware Descriptor Language 

terms (4 PT's) where each such PTi is as complex as a (HDL) and/or as a gate level description, or in other suitable 

product of up to 32 independent, PLB term input signals that form. Ultimately, and in one way or another, the original 

are sampled by multiplexing from an available and nearby circuit design will have to be re-mapped into terms of 

set of 96 independent signals. The l-out-of-3 sampling ratio, 5 sums-of-products, where such sums-of-products (SoP's) are 

32 max/96 that is implied in expression Exp. A2 is an input to ^ implemented in respective time slots (passes) by 

multiplexing factor of the PLB. It shows that the PLB has respective ones of available PLB's and then fed forward in 

only 32 input lines whose maximum of 32 input signals are Uel b L lines for subsequent p roce ssing. 

sampled from a nearby array of 96 signal broadcasting , n ordef tQ fit its resuhs ^ the limited t ^ ?Ames 

h ^ Ai *Tt hK A SU ' i S ne DTt 10 SamP " of each PLB, the design partitioning phase will have to cast 

all 96 of the broadcast signals. But no one PLB can, in a . A . ' r j . i *L » .u u i 

single pass (a single time slice that does not use feedback), f s P™*™ 7? Tr™t 

generafe a function signal that represents a function of as <° or less , tha " ,he N ; define ? an * ^ax^efeed hmite of the 

many as aU 96 of the broadcast signals. f sw results that can be produced by respective PLB s of the 

The significance of factors such as the above N, Kmax targeted CPLD. 

and L and the significance of the ratio Kmax/L will become 15 If the architecture of the targeted CPLD is such that each 

more apparent later. For now it should be understood that of me above-described factors, N, Kmax and L (Exp. A) is 

choice of the N, Kmax and L factors for a PLB is a matter relatively large, then the maximal fy^ results per PLB will 

of delicate design balance. tend to be relatively large and the design partitioning phase 

On the one hand, by choosing to use larger absolute values will be advantageously allowed to work with larger-sized, 

for N, Kmax and L plus larger values of the ratio, Kmax/L 20 partition chunks. However signal delay may become exces- 

a CPLD designer can advantageously provide greater flex- sive if N, Kmax and L are too large, 

ibility to the number of options that CPLD configuring On the other hand, if the architecture of the targeted 

software will have as it performs partitioning, placement and CPLD is such that each of the above-described factors, N, 

routing. On the other hand, if the CPLD designer arbitrarily Kmax and L (Exp. A) is relatively small, then the maximal 

chooses to increase the values of N, Kmax and L and to 25 i SoP results per PLB will tend to be relatively small and the 

increase the ratio, Kmax/L, the designer may find that such design partitioning phase will be disadvantageous^ forced 

modifications have led to excessive electrical capacitance on to work with comparably, smaller-sized partition chunks and 

routing lines and excessive signal processing delays. a larger number of interconnect lines. Signal delay may be 

The reason why, is because Kmax times L defines a more or less of a problem because of this. However, one 

number of crosspoints that will be created for each PLB 30 thing will be generally true. As the partitioning phase is 

when the Kmax number of lines of each PLB cross with the forced to produce larger and larger numbers of decreasing- 

L number of adjacent, signal broadcasting lines. The recip- in-size chunks, more work is disadvantageous^ created for 

rocal of Kmax/L indicates the minimum number of PLB's the next-described, placement and routing phases of the 

that will be needed to fully sample all L of the adjacent CPLD configuring software because they have to process 

signals. (L/Kmax times Kmax equals L.) Typically, the 35 more data objects. 

CPLD designer will want the CPLD to be able to process all After the partitioning phase is carried out, each resulting 

L signals simultaneously (in parallel) so the designer will chunk is virtually positioned ('placed') into a specific, 

provide at least a L/Kmax number of PLB's. The same chunk-implementing PLB of the designated CPLD during a 

reciprocal ratio, L/Kmax also gives a rough indication of the subsequent placement phase. 

extent to which the L signal broadcasting lines of the CPLD 40 In the ensuing routing phase, an attempt is made to 

architecture will be loaded by PIP's (programmable inter- algorithmically establish connections between the various 

connect points). The exact value of loading will depend on chunk- implementing PLB's of the CPLD device, using the 

the extent to which each set of L times Kmax crosspoints is interconnect resources (the L lines) of the designated CPLD 

fully or partially-populated by PIP's. device. The goal is to reconstruct the functionality of the 

One previous patent (U.S. Pat. No. 5,818,254 issued Oct. 45 original circuit design by appropriately connecting all the 

6, 1998) suggests that the number of input lines per PLB partitioned and placed chunks. 

(Kmax) should be kept relatively small (e.g., about 32 PLB If all goes well in the partitioning, placement, and routing 

input lines or less) and that a 3-level hierarchical switch phases, the CPLD configuring software will find a workable 

matrix should be employed to avoid excessive signal pro- 'solution' comprised of a specific partitioning of the original 

cessing delays in HCPLD's. This approach has benefits and 50 circuit, a specific set of primitive placements in specific 

drawbacks. On the one hand, capacitive loading is reduced PLB's, and a specific set of interconnect usage decisions 

for global interconnect. On the other hand, a 3-level hier- (routings). The software can then deem its mission to be 

archy in the switch matrix architecture can make Speed- complete and it can use the placement and routing results to 

Locking™ problematic as one tries to migrate to higher generate the configuring code that will be used to corre- 

density devices. 55 spondingly configure the designated CPLD. 

With the above factors in mind, we will now continue our In various instances, however, the CPLD configuring 

discussion about the basics of CPLD configuring software. software may find that it cannot complete its mission suc- 

The design partitioning phase needs to account for the cessfully on a first try. It may find, for example that the 

PLB architectural factors, L, N, and Kmax because those initially-chosen placement strategy prevents the routing 

values respectively define: (a) how many signals can be 60 phase from completing successfully. This might occur 

processed in parallel by the available plurality of PLB'S, (b) because signal routing resources have been exhausted in one 

how many product terms can be incorporated into each or more congested parts of the designated CPLD device, 

sum-of-products signal, and (c) what portion of the L Some necessary interconnections may have not been com- 

available signals each product term can encompass in just pie ted through those congested parts. Alternatively, all nec- 

one pass. 65 essary interconnections may have been completed, but the 

Of course, for purposes of entering into the design par- CPLD configuring software may find that simulation- 

titioning phase, the original circuit design can be originally predicted performance of the resulting circuit (the 
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so-configured CPLD) is below an acceptable threshold. For even to switch to another circuit implementing strategy such 

example, signal propagation time may be too large in a as FPGA or ASIC (where the latter is an Application Specific 

speed-critical part of the CPLD-implemented circuit. hardwired design of an IC). Each of these options invariably 

In either case, if the initial partitioning, placement and consumes extra time and can incur more costs than origj- 

routing phases do not provide an acceptable solution, the 5 nally planned for. 

CPLD configuring software will try to modify its initial CPLD device users usually do not want to suffer through 

place and route choices so as to remedy the problem. such problems. Instead, they typically want to see a fast 

Typically, the software will make iterative modifications to turnaround time of no more than, say a few hours, between 

its initial choices until at least a functional place-and -route the time they complete their original circuit design and the 

strategy is found (one where all necessary connections are 10 time a first-run CPLD is available to implement and physi- 

completed), and more preferably until a place-and-route cally test that design. 

strategy is found that brings performance of the CPLD- Aside from merely being able to implement a specific set 

implemented circuit to a near-optimum point The latter step of Boolean functions within a given CPLD IC, users of 

is at times referred to as 'optimization*. Modifications CPLD's also usually insist that the circuit implemented by 

attempted by the software may include re-partitionings of 15 the CPLD perform according to specified timing require- 

the original circuit design as well as repeated iterations of ments. Speed is often as important an attribute as full 

the place and route phases. Boolean correctness. That is why the user chose to use a 

There are usually a very large number of possible choices CPLD instead of an FPGA. 

in each of the partitioning, placement, and routing phases. Aside from speed and full function implementation, users 

CPLD configuring programs typically try to explore a mul- 20 of CPLD's also usually want a certain degree of re -design 

titude of promising avenues within a finite amount of time agility (flexibility). Even after an initial design is success- 

to see what effects each partitioning, placement, and routing fully implemented by a CPLD, users may wish to make 

move may have on the ultimate outcome. This in a way is slight tweaks or other changes to the original design. The 

analogous to bow chess-playing machines explore ramifi- re-design agility of a given CPLD architecture may include 

cations of each move of each chess piece on the end-game. 25 the ability to re-design certain internal circuits without 

Even when relatively powerful, high-speed computers are changing I/O timings. Re-design agility may also include the 

used, it may take the CPLD configuring software a signifi- ability to re-design certain internal circuits without changing 

cant amount of time to find a workable solution. the placement of various I/O terminals (e.g., pins). Such 

In some instances, even after having spent a large amount re-design agilities are sometimes referred to respectively as 

of time trying to find a solution for a given CPLD- 30 re-design Speed-Locking™ and Pin-Retention (the former 

implementation problem, the CPLD configuring software term is a trademark of Vantis Corp., headquartered in 

may fail to come up with a workable solution and the time Sunnyvale, Calif.). The more generic terms of: 're-design 

spent becomes lost turn-around time. It may be that, because Speed-Consistency* and 're-design PinOut-Consistency' 

of packing inefficiencies, the user has chosen too small a will be respectively used herein interchangeably with 're- 

CPLD device for implementing too large of an original 35 design Speed-Locking™ ' and 're-design Pin-Retention*, 

circuit. In addition to speed, re-design agility, and full Boolean 

Another possibility is that the internal architecture of the correctness, users of CPLD*s typically ask for optimal 

designated CPLD device does not mesh well with the emulation of an original design or a re-design in terms of 

organization and/or timing requirements of the original • good function packing density, low cost, low power usage, 

circuit design. 40 and so forth. 

Organizations of original circuit designs can include When multiple CPLD's are required to implement a very 

portions that may be described as 'random logic' (because large original design, high function packing density and 

they have no generally repeating pattern). The organizations efficient use of CPLD internal resources are desired so that 

can additionally or alternatively include portions that may be implementation costs can be minimized in terms of both the 

described as 'bus oriented' (because they carry out nibble- 45 number of CPLD's that will have to be purchased and the 

wide, byte-wide, or word-wide, parallel operations). The amount of printed circuit board space that will be consumed, 

organizations can yet further include portions that may be Even when only one CPLD is needed to implement a 

described as 'matrix oriented' (because they carry out given design, a relatively high function packing density is 

matrix-like operations such as multiplying two, multidimen- still desirable because it usually means that a lower cost 

sional vectors). These are just examples of taxonomical 50 member of a family of differently sized CPLD's can be 

descriptions that may be applied to various design organi- selected or that unused resources of the one CPLD can be 

zations. Another example is 'control logic* which is less reserved for future expansion needs or In-System Configu- 

random than fully 'random logic' but less regular than 'bus ration re -design (ISC redesign). 

oriented' designs. There may be many more taxonomical In summary, end users want the CPLD configuring soft- 
descriptions. The point is that some CPLD structures may be 55 ware to complete its task quickly and to provide an 
better suited for implementing random logic while others efficiently-packed, high-speed compilation of the function- 
may be better suited for implementing bus oriented designs alitics provided by an original circuit design, or by a design 
or other kinds of designs. tweak, irrespective of the taxonomic organization of the 

Even where a CPLD architecture is specifically designed original design, 

to mesh with bus oriented designs, the bit width of the bus 60 Some previous CPLD architectures meshed well with 

oriented design may present a problem. More on this later. specific taxonomic organizations. However, preferences 

We first continue describing the usage of CPLD configuring among taxonomic organizations tend to change over time, 

software. Industry standards may, at first, favor designs where address 

If the CPLD configuring software fails in a first run, the and data words have a size in the range of 8 to 16 bits. Later, 

user may choose to try again with a differently-structured 65 industry standards may migrate towards larger-sized orga- 

CPLD device. The user may alternatively choose to spread nizations of signals such as address and data words having 

the problem out over a larger number of CPLD devices, or sizes in the range of 32 to 64 bits. A CPLD having an 
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architecture that is optimized for bus-oriented word sizes of least as many longlines as there are macrocells and I/O pads 

8 to 16 bits may not be able to efficiently accommodate in the segment, thereby assuring that every macrocell signal 

designs where word sizes increase into a range of say, 32 to (MFB) and every I/O signal (IFB) can be simultaneously 

64 bits. What is needed is a scalable architecture that can transmitted through the GSM from one segment to one or 

accommodate designs having word sizes in the range of 32 5 more otrj er segments. The GSM has at least as many 

to 64 bits without losing speed and re-design agility. longlines for inter-segment (global) communications as do 

SUMMARY OF THE INVENTION two SSM's. Thus 100% inter-segment (global) communica- 

A . , ... A~Ti¥ r\ j _j tions may occur simultaneously between each of two pairs 

An improved, scalable CPLD device in accordance with c , ^ , PTT , , * , . A , , . r 

. F . ' . , . . ... % , of segments. Each SLB has at least 4 ways (and in one 

the mvention comprises a two-tiered hierarchical switch _ . ~. . # . t10 \ «• . tt - e ju ^ 

# . . ■ n u i c u * • / rC wv , 10 embodiment, at least 12 ways) of transmittmg a feedback 

matrix construct having a Global Switch Matrix (GSM) and . ww ™ rrox \ \ u rcu . ? . . t , 

. fo 4 ? . . w 4 . /c.o»j* \ V. i signal (MFB or IFB) through the GSM and then back to 

a plurality of Segment Switch Matrices (SSM s). Coupled to .f. > \ * *u 

« oc ,w • i r i_i t • ri i either its own segment or to one or more other segments, 

each SSM is a plurality of programmable logic blocks ^ , e vT , „ . f A . 0rt , Z~ . 

(referred to as SLB's herein). Each SSM and its plural *° r one embodiment where B is 64, the 80 (=B times 

number of SLB's define a 'segment' that couples to the e ^ P arallel m P uts of t each SLB ease implementation of 

15 64-bit wide design problems. Each segment has at least 64 

Each SLB can receive B times 125% independent input F° P ads * ^ or all ^° f a ^ one or ™* SLB's 

signals from its respective SSM, where B is a dataword ™ the ^f™*} m ^ be bn ^ ° f nes ' Symmetry within the 

bit-width of a nominal design problem (e.g., a 64-bit wide desi f of e f ch f ow J or l^T , 

design problem, in which case, 1.25xB^0). Each SLB can M miplementaUons such as for 32 or 16^it wide designs. A 

generate product term signals (FTs) that are Boolean prod- 20 convenient migration path is therefore ^provided by one 

ucts of as many as all of its 1.25 times B, independent input ™^ ^lecture for implementing 16-bit wide designs, 

terms (e.g., 80 independent input signals). With use of 32 ' blt dcS1 ^ and ^ ^ desi & ns ' 

simple allocation and/or 'super-allocation' (where the latter 0ther of the invention will become apparent from 

is defined below), substantially large sums of such input- M ^ below detailed description, 

dense PT's may be produced in each SLB. Some of the BRIEF DESCRIPTION OF THE DRAWINGS 
product terms generated within each SLB may be dedicated 

to SLB-local controls. T° e below detailed description makes reference to the 

Each SLB has at least 32 macrocells and a plurality of I/O accompanying drawings, in which: 

pads associated with the SLB. The macrocells and associ- ^ FIG. 1 is a block diagram of a design problem that may 

ated I/O pads of each SLB feed their respectively produced call for CPLD glue logic with capabilities to efficiently 

signals directly both to the local SSM and to the global handle differing word sizes; 

GSM. Note that the direct feeding of result outputs from FIG. 2 is a block diagram showing one combination of a 

local-level macrocells to the global-level GSM, while CPLD 'Segment' and a 'Global Switch Matrix' (GSM) in 

bypassing the SSM, constitutes a breach of normal hierarchy 35 accordance with the invention; 

rules. Under normal rules, the local-level macrocells would no. 3A is a block diagram showing a first CPLD having 

feed their outputs only to an intermediate -level construct a plurality of Segments and a common GSM in accordance 

such as the SSM, and the latter would then feed selected with the invention; 

ones of such outputs to a next-higher of hierarchical nG 3B ^ a b ' lock diagram showing a second CPLD 

structures, such as to the global-level GSM. However, in ^ having a pluralily of Segments and a common GSM in 

accordance with the invention, hierarchy is circumvented on accordance with the invention- 

the output side in order to speed result signals forward for ^ ^ a ^ for ^ ^ 

global broadcast to subsequent resources of the CPLD. in others of the drawings; 

Some or all of the plural I/O pads associated with each - . . >. . . 4C . . D1 , , 

ott» . <u • i, • .« i , tt FIG. 5 is a schematic showing a 'Super Logic Block 

given SLB may be 'buned , meaning they do not connect to 45 /CT ox . , ... \ A ^ . 

& . , , . rjJ & . J c . , , (SLB) in accordance with the mvention and further showing 

external packaging pins. Thus the number of external pack- 7 , . , , . r f 

. &&f N-way routing capabmUes provided by couplings through 

aging pins per segment can vary and can go as high as all the 4l _ J *. r t „ • » j 

*ir\ a e lctd i.« i- «i_ . f CrD , the corresponding 'Segment Switch Matrix (SSM) and 

I/O pads of each SLB multiplied by the number of SLB s per throu h the GSM- 
segment. In one embodiment, there are at least 16 I/O pads 

associated with each SLB and 4 SLB's per segment. Thus 50 FIG * * 15 \ schematlc showm g a ^ neral structure of a 

there can be as many as 64 I/O pins or more per segment in rnacrocell module that may be used within the SLB structure 

this embodiment. ^ ^ of FIG. 5; 

Each SSM has within it and dedicated for intra-segment nG - 7A » a schematic showing a method for distribu- 

communications, at least as many longlines as there are tj^Iy multiplexing the SLB outputs of eight segments onto 

macrocells and I/O pads in the segment. This assures that 55 lmes °^ a GSM » 

every macrocell signal (MFB) and every I/O signal (IFB) of FIG. 7B shows a banding scheme that may be used in 

the segment can be simultaneously broadcast through the conjunction with the distributive multiplexing method of 

SSM. Each broadcast signal (longline signal) of the SSM F 1 ^- 7A i 

preferably has at least 3 ways of feeding into a targeted SLB FIG. 7C is a block diagram corresponding to FIG. 3B and 

of the same segment. Thus, an in-segment SLB that gener- 60 further showing how the distributive multiplexing methods 

ates a feedback signal can assuredly transmit the feedback of FIGS. 7A and 7B may be implemented in the second 

signal through its local SSM and then back to either itself or CPLD of FIG. 3B; 

to another SLB of the same segment. CPLD configuring FIG. 8 is a schematic showing how one segment may 

software is thereby given wide flexibility for routing intra- borrow I/O pins from other segments while maintaining 

segment signals. 65 Speed-Consistency and/or Pin Out-Consistency; 

Each SSM further has within it, and as dedicated for FIGS. 9A-9B are schematics showing a sliding bands 

inter-segment (global and/or broadcast) communications, at scheme that may be used to uniformly distribute feedback 
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from the GSM to respective SSM's and thereby avoid 
routing congestion; 

FIGS. 9C-9D are further schematics showing further 
details of a sliding bands scheme; and 

FIG. 9E is a schematic showing a technique for minimiz- 
ing line length in the GSM to SSM couplings. 

DETAILED DESCRIPTION 

FIG. 1 shows, for purpose of example, a possible system 
100 that uses plural CPLD's such as 115 and 125 for 'glue 
logic'. CPLD 115 is a first integrated circuit having a 
respective plurality of I/O pins 115a (external terminals) that 
are coupled by PCB traces to other circuits provided on a 
printed circuit board (PCB) 101 of the system 100. CPLD 
115 is interposed between a first central processor unit 
(CPU) 110 and one or more, on-board or off-board periph- 
eral devices 119 associated with that first CPU 110. 

CPLD 125 is similarly a second integrated circuit having 
a respective plurality of I/O pins 125a that are coupled by 
PCB traces to other circuits provided on the same printed 
circuit board 101. CPLD 125 is mounted to PCB 101 so as 
to provide interfacing between a second CPU 120 and its 
respective, on-board or off-board peripheral devices 128, 
129. 

In the illustrative case of system 100, CPU 110 is a 64-bit 
microprocessor that has a 64-bit wide, and time-multiplexed, 
address/data bus 112 (A/D bus 112). CPU 110 further has a 
clock input line 113 for receiving a respective first clock 
signal, CLK^A. 

Additional control signals (CTL) may be provided on a 
separate control bus 114. Bidirectional bus 114 is depicted 
by a dashed, double-arrow symbol to indicate that it may 
either be physically there in full or it may instead be partly 
or wholly a phantom bus whose CTL signals are instead 
included as time -multiplexed signals that are passed along 
the, 64-lines wide, A/D bus 112. 

If CTL bus 114 is real rather than phantom, then a 
corresponding number of the I/O pins 115a of IC 115 will be 
consumed for servicing that real bus. If CTL bus 114 is 
instead phantom, then the same I/O pins 115a that service 
A/D bus 112 can also service the control signals of bus 114 
on a time-multiplexed basis. Of course, for the latter case, 
additional signal-processing resources within the CPLD 115 
may have to be consumed to support the time-multiplexed 
routing of the A/D and CTL signal transmissions through a 
shared set of I/O pins 115a. Also for the latter case, the speed 
at which CPLD 115 can process the A/D signals (112) may 
be disadvantageously reduced because time slices are being 
donated (stolen) to support the transmitting of the CTL 
signals (114). 

Empirical observations have shown that the number of 
parallel lines that will generally be needed for carrying the 
CTL signals 114 associated with address and/or data signals 
of the 64 parallel lines in A/D bus 112, is typically about 8 
or slightly more. In other words, the number of further-and- 
control carrying, parallel lines in the real or phantom CTL 
bus 114 tends to be on the order of about 12.5% or a slightly 
larger fraction of the number of parallel lines in associated 
A/D bus 112. It can be less as well. In some instances, the 
number of fiirther-and-control carrying, parallel lines in the 
real or phantom CTL bus 114 can be as large as 25% of the 
number of parallel lines in the A/D bus 112. 

Examples of signals that might be deemed as parallel and 
overhead CTL signals include: error correction bits, frame 
identifying bits, and handshake protocol bits. More 
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specifically, if simple parity check is used, then there will 
usually be one parity bit for every 8 bits of a data word.ltois 
a 64-bit wide address and/or data word might call for at fast 
8 additional CTL overhead bits just for providing a simple 
5 error detection function. More bits may be used if more 
complex, error catching and correction (ECC) functions are 
to be provided. 

If A/D bus 112 is used on a time -multiplexed basis, then 
one or more additional ones of the CTL overhead bits 114 
io may serve as frame identifying bits for identifying what 
phase and/or process is having its data being transmitted at 
the moment over bus 112. For example, a first time slice may 
be dedicated for a CPU address-outputting phase for a 
process involving a first of plural peripheral devices (119) 
15 while a second time slice may be dedicated for a peripheral 
data-inputting phase for a process involving a second 
peripheral device. 

One or more additional ones of the CTL overhead bits 114 
may serve as handshake protocol bits whereby peripheral 
20 devices (119) acknowledge error-free receipt of instructions 
or data and/or they report what state they are next going into 
(e.g., bus master or slave). 

The CTL overhead bits (114) described above are merely 
examples. There is a wide range of possibilities for different 
25 kinds of industry standard buses (e.g., SCSI, PCI, etc.) and 
for nonstandard bus designs. Even when one considers a 
simple 8:1 parity check with one additional bit for flagging 
data versus address, and one further additional bit for 
handshaking, it is seen that 10 bits (about 16% of 64) have 
already been consumed on the CTL bus 114 for supporting 
a 64-bit wide A/D transmission on bus 112. If both an A/D 
bus transmission and a control overhead transmission are to 
occur in parallel so that the CPLD glue logic 115 can process 
both simultaneously rather than in separate time slots, then 
35 the CPLD glue logic 115 needs to have a sufficient number 
of external package pins and internal resources to handle 
such, relatively-wide, parallel transmissions. 

Typically, the CTL overhead bits (114) will include a 
^ RESET signal for flagging a global resetting, to a predefined 
state, of the CPU 110 and its associated circuits 115 and 119. 
Thus, one yet further overhead control signal that may need 
to be carried by the actual or phantom CTL bus 114 is such 
a global RESET signal. In the illustrated example, the 
45 RESET signal for CPU 110, CPLD 115, and peripheral 
device(s) 119 is initiated by actuating a RST__A line that is 
coupled to the CPU 110. It then propagates through CTL 
buses (e.g., 114 and 117 where the latter is described shortly) 
to reach the other devices. 
50 Typically, when the first A/D bus 112 that couples CPU 
110 to CPLD 115 is 64-bits wide, there will be a corre- 
sponding 64-bit wide and second A/D bus 116 provided 
between the CPLD glue logic 115 and its one or more, 
associated peripheral devices 119. It is possible, on the other 
55 hand, that the address/data interface between the CPLD glue 
logic 115 and peripheral devices 119 may have a word size 
smaller than 64 bits, for example, such as 32 bits or 16 bits. 

Also as shown, there will typically be a real or phantom, 
second control bus 117 between the CPLD glue logic 115 
60 and peripheral devices 119. The number of parallel control 
signals carried by this phantom or real control bus 117 will 
usually be about the same as that carried by the first CTL bus 
114, although it may be different. 
The total number of the I/O pins 115a of IC 115 that are 
65 consumed for servicing A/D buses such as 112, 116 and CTL 
overhead buses such as 114, 117 will vary depending on how 
wide each and all of the utilized buses are, and what extent 
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each is phantom or real. Of course, although it may have wide-datapaths in parallel (simultaneously) so that process- 
more pins, CPLD 115 must have at least as many I/O pins ing results can be obtained as quickly as possible. 

115a as are necessary for simultaneous servicing of its fig. 2 provides an introduction to a scalable CPLD 

respective and various buses 112, 116, etc. architecture 200 in accordance with the invention that is 

In the illustrative system 100, the second CPU 120 is a 5 designed to provide flexibility and speed. 

16-bit microprocessor that has a corresponding 16-bit wide, The structure shown within dashed box 201 is referred to 

time-multiplexed, address/data bus 122 (A/D bus 122). as a 4 segmentMn a central portion of this segment structure 

Second CPU 120 further includes a respective clock input 2 01, there is provided a Segment Switch Matrix (SSM) 250. 

line 123 for receiving a respective second clock signal, Symmetrically disposed about the SSM 250 there are an 

CLK_B. Additional control signals (CTL) may be provided io even num ber of programmable logic blocks, such as the 

on a separate, third control bus 124. Bus 124 is shown illustrated four identical units which are each referred to 

dashed to indicate that it may either be physically there in herein ^ a Super j^g^ Block (SLB). The four SLB's are 

full or may instead be partly or wholly a phantom bus whose respectively designated here as 210, 220, 230 and 240. 

corresponding CTL signals are included as time-multiplexed Corresponding and identical groups of 16 I/O pads each 

signals along A/D bus 122. 15 ( w h erc tne pa ds are either buried, or contrastingly, con- 

The number of further parallel lines that may be needed nected to external terminals) are provided respectively for 
for carrying control signals 124 associated with address SLB's 210, 220, 230 and 240. The I/O pad groups are 
and/or data signals of the 16 parallel lines in A/D bus 122, respectively designated as 216, 226, 236 and 246. 
is typically about 2 or sightly more (about 12.5% or a greater i t j s seen f rom (ne broad overview of FIG. 2 that a 
fraction of the number of parallel lines in associated A/D bus 20 'segment' 201 is capable of inputting and/or outputting as 
122, e.g., 2 parity bits). In some rare instances however, the maDy ^ 64 j/q signal simultaneously from the combination 
number of fiirther-and-control carrying, parallel lines in the of I/0 pad groups 2 16, 226, 236 and 246. The same 
real or phantom CTL bus 114 can be as large as about 50% arrangement 201 may alternatively be used for transceiving 
(about 8 lines). One of the control signals that is carried by tne s i grja ls of four separate, 16-bit wide buses or for trans- 
actual or phantom CTL bus 124 can be a RESET_B signal 25 ce iving the I/O signals of two, 32-bit wide buses. SSM 250 
for flagging a resetting to a predefined state of second CPU can oe symmetrically organized to provide efficient opera- 
120 and its associated circuits 125 and 128, 129. In the iioa for 64 _ bit 5us operations, 32-bit wide bus 
illustrated example, the RESET_B signal for second CPU operations, or 16-bit wide bus operations, or even 8/24-bit 
120, second CPLD 125, and corresponding peripheral DUS operations if desired. 

devices 128, 129 isimdated by actuating a RST_B line that 30 Referring t0 SLB 2 10 as an exemplary representative of 

is coupled to the CPU 120. ^ identically-structured other three SLB's of the same 

Typically, when the third A/D bus 122 that couples second segment 201, each SLB receives a first set of 80 input signals 

CPU 120 to second CPLD 125 is 16-bits wide, there will be from me SSM 2 50. The first SLB input set for SLB 210 is 

a corresponding 16-bit wide and fourth A/D bus 126 pro- identified as 211. Independent, but essentially similar SLB 

vided between the CPLD glue logic 125 and its one or more, mput sets of 80 signals each are available to each of the other 

associated peripheral devices 128. Additionally or alterna- SLB's 220-240 of the same segment 201 and are each 

lively there may be a corresponding, but narrower or wider carrie d by a respective, 80-bits wide bus. The SLB input 

fifth A/D bus 127 (e.g., 32-bits wide) provided between the Duses of the other three blocks are respectively designated as 

second CPLD glue logic 125 and its one or more, associated 221, 231 and 241. 

peripheral devices 129. ^ m ^ carded by me dghty paralle i hnes of first 

Also, although not shown, there will typically be real or input bus 211 can represent, by way of example, sixty-four 

phantom, control buses between the second CPLD glue simultaneous bits of data or address combined with up to 

logic 125 and its associated peripheral devices 128, 129. The sixteen simultaneous control signals (up to a 25% control 

number of parallel control signals carried by these phantom 45 overhead). Thus if B is the number of parallel databits of a 

or real control buses (not shown) will usually be about g i ven design problem, each SLB can support parallel pro- 

12.5% or less of the number carried by the associated A/D cessing of 125% of B up to a value where, for the illustrated 

bus, although it may be much higher such as 25% to 50%. segment design, B can be as large as 64 bits. Migrations to 

In addition to providing interfaces between different larger segment designs where B is 96, 128, and so forth are 

CPU's such as 110 and 120 and their respective peripheral 50 within the spirit of the present invention. Of course, die size 

devices 119, 128, 129; the CPLD glue logic circuits 115 and may have to be increased and pin-to-pin delay may suffer if 

125 may need to talk to one another over a control bus such the number (B times 1.25) of SLB input lines per SLB 

as 133. The number of parallel lines provided in CTL bus increases without commensurate improvements in the 

133 may be as many as 16, 32 or 64. underlying technology (e.g., without using smaller, lower 

Given these various possibilities (FIG. 1), one problem 55 voltage transistors, without using metal interconnect with 

that confronts designers of CPLD integrated circuits is how lower resistivity such as copper, and so forth), 

to arrange both the I/O pins (e.g., 115a, 125a) and internal Each of the eighty lines of input bus 211 is a general 

components of a CPLD IC so that the CPLD (e.g., 115, 125) purpose line that may be used for carrying any kind of input 

can operate efficiently both under circumstances where it is signal. The example that is given above regarding 64 

processing relatively small datawords having 16 or less bits 60 address/data signals and 16 overhead control signals is 

each and where it is processing relatively larger datawords merely an example to demonstrate how the architecture of 

having 32, 64 or more bits each. segment structure 201 may be exploited to implement a 

Speed is typically of the essence when CPLD's are being circuit that corresponds to CPLD 115 (FIG. 1), and its 64-bit 

used. It is generally not acceptable to serialize a 64-bit wide A/D bus 116 and its associated 16-bit wide, overhead 

design into 4 or more time slices that each process one fourth 65 control bus 117. All 80 SLB input signals can be present at 

or a smaller fraction of a 64-bit wide operation. Users want a same time as independent signals that are output from SSM 

the CPLD to process the entire 32, 64, or more bits of their 250 and are input through SLB input bus 211 into SLB 210 
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so that the 80 SLBinpul signals (211) can be simultaneously involved in routing signals through the GSM from <onc 

processed by SLB 210. segment to another. If L2 is increased the GSM delay tends 

Because a control overhead of 25% is an extreme and the to disadvantageous^ also increase. Further in Exp. B, Jhe 

more typical control overhead is about 12.5%, the excess value of the global midtiplexing factor of [8] was ch™ 

partof the80inputpathsprovidedbySLBinputbus211can 5 that each signal of the Level 2 interconnect ha s «fa*3 

be seen as a ktad of agilitv insurance that gives CPLD wa y s ° f be.ng routed from (be SSM I to a desired SLftltee 

a - b , J fr.r specifically, Kmax«80 times 8 PIP s per SLB mart fine, 

configuring software more degrees of freedom for routing a J^kied ^^192 SSM global lines provides a ro£*flty 

necessary signal from SSM 250 into SLB 210. of 3 33 wayg for ±Q ^ % interconQect 

SLB 210 can produce 32 macrocell result signals where routability for Level 2 signals does not have to matcbinatof 

each is a sum-of-products function whose product terms 10 Level 1 signals. In the particular embodiment of Exp. B» il 

(PTs) can each be a product of up to the full 80, general does. 

purpose signals provided by input bus 211 or their comple- Qualifier words used above, such as 'single-delay* wfll be 

ments. If desired, the 80 general purpose signals provided by detailed later. Those skilled in the art of CPLDfc cm 

input bus 211 may be used to form: one or more of local appreciate that the N=5+ value can be raised to larger vakes 

control signals for specific macrocells (e.g., I/0_OE) and/or 15 such as N=10 or N-20 or larger by use of sums re- al he ati o n , 

local control signals for specific blocks (e.g., SLB_RST) In a sums re-allocation operation (see also FIG. 6), a sumof 

and/or local control signals for specific segment-wide con- sums ( SoS ) » formed. For example, four SoP's of 5FTs 

trol functions (e.g., SEG_RST) and/or global control sig- migbt f be P Rr ^ together to define a SoP of 30PTS. 

nals for CPLD-wide, global control functions (e.g., GLB_ Jhe result of a firsl iSum^f-Sums (SoS) operation may feeif 

20 be further allocated as a contributing sum to a yet laqpr SoS 

, , ,„ „ v . . , , (this is referred to as* super '-allocation herein). For example 

The sumK)f-products (SoP) result signals 212 that are tf ^ 2Q ^ ^ of a fifst QR afe summed ^ 

produced by the 32 macroceUs of each SLB (210) are also re . allocalion with the 20 PT's result of a second OR gate, a 

referred to herein as macrocell feedback signals or MFBs . fanctionally richer resuh based on 40 * obeaincd . 

In one embodiment ( see FIG. 5), each MFB signal can 25 However such chained re-allocation (super allocation) may 

take on the expressive form: incur additional gate delays. Each serial passage of a bottle- 

fs^l^pr^SQ ^(u-192 ^tfExp. B} *T ™re OR gates increases the 

- - ultimate delay of the resulting Sum-of-Sums. Thus the 

wherein the N=5+ factor indicates that a single-delay, one- value, N=5+ can be much larger if the circuit designer is 

pass sum can be a sum of a 'cluster* of at least as many as 30 willing to tolerate more than a single quantum of allocation 
five product terms (5 PT's), but can be larger with use of delay. 

allocation. In the expression, Exp. B, each product term, PTi Continuing with our overview of FIG. 2, MFB signals 212 

can be a Boolean AND of as many as 80 independent input are fed both to SSM 250 and to Global Switch Matrix 

signals. (GSM) 280. Note that the direct feeding of outputs from 

The Kmax=80 independent input signals of Exp. B can be 35 local-level SLB's to the global-level GSM, while bypassing 

obtained by sampling from a larger available set of 384 the SSM, constitutes a breach of normal hierarchy rules, 

signals. The available set of L=384 lines is subdivided into Under normal rules, the local-level SLB's would feed their 

192 segment -wide, local lines (LI or 'Level 1*) and 192 outputs only to an intermediate-level construct such as the 

global lines (L2 or 'Level 2'). The '[8]' factor that divides SSM, and the latter would then feed selected ones of such 

into each of the Level 1 (LI) lines and Level 2 (L2) lines 40 outputs to a next-higher of hierarchical structures, such as to 

indicates the level of partial-population that fills the cross- the global-level GSM. However, here, MFB outputs (212) 

points array formed by the intersection of the Kmax«80 and/or IFB outputs (217) are being fed directly into the GSM 

lines (bus 211) of each SLB and the crossing Ll-192 and 280. 

L2«192 lines of SSM 250. If the delay of feedback or cascading can be tolerated in 

In Exp. B, the value of LI was chosen so that LI equals 45 a given design, then any one or more of the 80 inputs of SLB 

at least the number of MFB's in the segment (32 times 4) so input bus 211 can itself be an MFB signal that was originally 

that 100% of these can be fed back by the Level 1 inter- generated by the same SLB 210 and thereafter fed back 

connect and additionally at least the number of IFB's in the through either SSM 250 alone or through GSM 280 and then 

segment (16 times 4) so that 100% of these can be fed from through the SSM 250 back to SLB 210. Alternatively or 

respective I/O pads to the Level 1 interconnect. Further in 50 supplementally, any one or more of the 80 inputs of SLB 

-Exp. B, the value of Kmax=80 was chosen to be at least input bus 211 can be an MFB signal that was generated by 

125% of the targeted width of 64-bit wide data buses so that another SLB (220-240) of the same segment (201) and 

up to 25% additional control overhead can be supported in thereafter forwarded by way of SSM 250 and bus 211 to 

parallel. Further in Exp. B, the local multiplexing factor of SLB 210. As yet another variation, any one or more of the 

[8] was chosen so that each signal of the Level 1 intercon- 55 80 inputs of SLB input bus 211 can be an MFB signal that 

nect has at least 3 ways of being routed from the SSM to a was generated by another SLB in a different segment (see 

desired SLB. More specifically, Kmax«80 times 8 PIP's per FIG. 3) and thereafter passed through the GSM 280, and 

SLB input line, divided by Ll-192 SSM local lines provides then through SSM 250 and then through SLB input bus 211 

a routability of 3.33 ways for the Level 1 interconnect to reach SLB 210. 

signals. 60 As yet a further variation, any one or more of the 80 inputs 

Moreover in Exp. B, the value of L2 was chosen so that of SLB input bus 211 can be an externally-produced I/O 

L2 equals at least the number of MFB's plus IFB's in each signal (an IFB signal 217). Up to 64 such externally- 

segment (48 times 4) so that 100% of these can be fed from produced I/O signals can come directly from the up-to 64 

one segment to a second segment by way of the GSM and I/O pins (nonburied pads) of the illustrated segment 201 

the Level 2 interconnect to the SLB's of the second segment. 65 while up to another 16 such externally -produced I/O signals 

L2 does not have to be the same as Ll. It can be larger. can come from I/O pins of other segments. This feature will 

However, one of the concerns is the additional time delay be discussed further when we reach FIG. 8. 
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SSM 250 has 384 vertical longlines (indicated in the I/O bus 215 (225, etc.) to respective I/O pads 216 (226, etc.). 
drawing by 'V384')- These V384 lines of SSM 250 can Notallof the I/O pads 216 necessarily connect to an external 
simultaneously carry 4 separate sets of 80 independent package pin. Some may be 'buried' pads, 

signals each respectively to the 4 SLB input buses 211, 221, One or more of the sixteen I/O signals on I/O pads 216 

231 and 241 of SLB's 210, 220, 230 and 240. Note that the 5 may be instead generated outside the CPLD and supplied 

number of horizontal crosslines in SSM 250 is 320 (denoted into the chip by way of respective I/O pins that connect to 

as H320). nonburied ones of the I/O pads 216. The extemally-sourced 

One possibility is that roughly one half (e.g., up to 48) or or internally-produced I/O signals may be transmitted by 

less of each set of a respective 80 signals is constituted by way of bus 217 from I/O pads 216 to SSM 250 and also to 

local, segment-generated signals and a remaining roughly 10 GSM 280. Bus 217 may also serve as a path byway of which 

one half part (e.g., up to 48) or less of each set of 80 signals externally-generated signals enter the CPLD through I/O 

is constituted by global signals that were generated outside pads 216 and then enter into the SLB 210 for synchroniza- 
the segment 201 and transmitted into SSM 250 by way of tion before being forwarded via bus 212 or 215 to one or 

GSM 280. Note that 4 times 48 is 192, where the number 48 both of SSM 250 and GSM 280. In this latter transfer 

corresponds to the sum of 32 MFB outputs of each SLB plus 15 process, an internal (not shown), data storing portion of SLB 

the 16 IFB outputs (described shortly) associated with each 210 may receive the extemally-sourced I/O signals 217 for 

SLB. In the illustrated example, GSM 280 can inject a same storage and subsequent output onto MFB bus 212 and/or I/O 

number of 192 global signals into a global part of SSM 250 bus 215 as will be described below, 

as indicated by bus 285. Although the above discussion has focused on SLB 210, 

The number of global signals that GSM 280 can inject 20 it is to be understood that each of SLB 220, 230 and 240 has 

into each SSM does not have to be the same as the number a similar arrangement of inputs and outputs which are 

of local signals carried by the SSM 250. It just turned out to referenced accordingly in FIG. 2. Furthermore, each of SLB 

be the same for this particular embodiment. For an alternate 210, 220, 230 and 240 receives four global clock signals 

embodiment, it is contemplated that the GSM will be able to (GCLK's) from a global clock bus 290 that extends fully 

inject up to 256 global signals into each SSM. The number 25 across the CPLD array. 

of longlines in the GSM may have to be increased com- It is seen from the above that SSM 250 receives 192 

mensurately however. That means that die size may become general purpose, global signals from GSM 280 by way of 

larger, more PIP's may need to be provided in each switch connection 285. Another 192 input signals of Segment 

matrix, and more delay may be incurred. Switch Matrix 250 are defined by the 100% intra -segment 

Another possibility for our example wherein four sets of 30 return of the four sets of 48 signals each produced by the 

80 signals each are being carried by the 384 longlines of MFB and IFB resources (buses 212, 217, 222, 227, 232, 237, 

SSM 250 is that, two of these sets (a total of 160 signals) are 242, 247) of the corresponding SLB's 210-240. SSM 250 

fully constituted by local, segment-generated signals, while can be viewed as including a matrix of 384 vertical longlines 

the other two of these sets of 80 signals each are fully (V-LL's) and 320 crossing over, horizontal shortlines 

constituted by global signals that were generated outside the 35 (H-SL's). The count of the 320 shortlines is formed by the 

segment 201 and transmitted into SSM 250 by way of GSM four sets of 80 signals each output from the SSM 250 into 

280. respective SLB input buses 211, 221, 231 and 241. The 

The extra number of lines provided by the difference in crossed-lines matrix in SSM 250 of 384 vertical lines and 

SSM parameters, V384-H320=V64 provides a form of 320 horizontal lines is represented by the symbol, V384/ 

insurance or padding for the CPLD routing software. As will 40 H320. This V384/H320 matrix of crosspoints is preferably, 

be seen in FIG. 5, SSM 250 can contain a large number of partially populated by a similar set of PIP's (programmable 

crosspoints. (Kmax«80 times L«384 equals 30,720 cross- interconnect points) so that each SSM local longline is 

points per SLB.) However, notallof these will be populated loaded by an essentially same and respective number of 

by PIP's. The partial-populating of crosspoints means that PIP's, and so that each SSM global longline is similarly 

not all conceivable routing paths are available, as would 45 loaded by an essentially same and respective number of 

alternatively have been the case if the crosspoints had been PIP's, and furthermore so that each SSM shortline is loaded 

fully-populated by PIP's. by an essentially same and respective number of PIP's. 

It was found by trial and error testing that the job Thus, for the illustrated embodiment, a respective and 

completion speed of CPLD configuring software increases substantially same delay is provided by routing to any 

dramatically when the routability factor from each of the 50 corresponding SSM output line (of buses 211, 221, 231, 241) 

SSM longlines (the V384 lines) to a target SLB is increased either a respective signal from any SSM local input (e.g., 

from about 2 or less ways per longline to 3 or more ways 212, 217) or a respective signal from any SSM global input 

(e.g., 3.33 ways on average, which value is calculated from (285). 

Kmax=80 times 16 PIP's per H-line, divided by 384 GSM 280 can directly receive up to 192 general purpose 

V-lines). Increasing the routability factor to better than 4 55 signals from each segment (e.g., 201), can output up to 192 

increases the job completion speed of CPLD configuring general purpose signals to, each segment (by way of bus 

software even more, but at the cost of increasing die size 285), and can carry as many as 384 inter-segment signals, 

significantly and reducing pin-to-pin processing speed. The H384/(V384 per segment) matrix of crosspoints in 

Thus, a routability factor of about 3 and a fraction ways GSM 280 is preferably, partially populated by a symmetri- 

rurned out to provide an acceptable balance between speed- 60 cally distributed set of PIP's (programmable interconnect 

ing CPLD configuration software and producing CPLD points) so that each GSM longline (horizontal) is loaded by 

chips that are neither too large nor too slow. an essentially same and respective number of PIP's and each 

The 32 MFB signals (e.g., 212, 222, etc.) that are respec- to-GSM inputting shortline (vertical) is similarly loaded by 

tively produced by the 32 internal macrocells of each SLB an essentially same and respective number of PIP's and 

(210, 220, etc.) may be used to selectively generate a smaller 65 furthermore each from-GSM outpulting shortline (feeds into 

subset of sixteen I/O signals. These 16, SLB-produced I/O 285) is similarly loaded by an essentially same and respec- 

signals may be provided on a tri-stated basis and by way of tive number of PIP's. Thus a substantially same delay is 
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provided by routing a signal from any GSM input to any nals and has 128 raacrocells. The 384 longlines (horizontal 

corresponding GSM output. lines) of GSM 380 may be used as substitute for a primed 

FIG. 3A illustrates a first CPLD monolithic device 300 in circuit board which can interconnect the total of 512 I/O 

accordance with the invention. One version of monolithic IC pa ds (buried or not) of the 8 mini-CPLD's in a wide variety 

300 employs at least four layers of metal interconnect and 5 Q f ways. 

transistors with drawn channel lengths of 035// or less and Alternatively, the 192 global V-lines (vertical lints) of a 

effective transistor channel lengths of 0.25/1 or less. The Vdd first Segment Switch Matrix (e.g., SSM_A) can be folly 

voltages of such 0.25/4 Leff transistors is typically 3.6V or interconnected by way of the 384 H-lines of the Global 

less. The metal interconnect is used forlonglines m switch Switch Matfix (GSM) 38Q tQ receive a rorrespomfog m 

matrices for reducing routmg delays. The submicron tran- 1Q globals ^ lsfomanyoterS eg mcnt (c.g. f SEG^ a oto 

sistors are used for defining PIP s (programmable intercon- z , a a 1 u i • tii • . ^± 

nect points) having relatively short signal transmission ^k?^" 

times Pm-to-any-other-pin delay time in CPLD 300 can be double-mmi-CPLD. (The local 192 V-lines in each SSMcan 
fixed (SpeedLocked^) to a global base delay as short as, in be ™** for fuUy-sup^rting local feedback^) 
one embodiment, about 15 nS-12 nS (nanoseconds) or less, Alternatively, the 384 H-hnes of GSM 380 may be used 
and in a second embodiment as short as about 10 nS or less 15 on a more sparing basis to couple certain selected MFB 
for none-allocated or simply-allocated function signals (e.g., and /° r IFB signals of any first Super Logic Block (e.g,, 
up to 20PT's per such signal). Intra-segment pin-to-any- SLB1_A) to serve as inputs for any other Super Logic 
other-pin delay time can be fixed (SpeedLocked™) in the Block (e.g., SLB4_H). CPLD configuring software deter- 
second embodiment to a local base delay as short as about mines how many such global interconnects can be made 
75 nS or less for none-allocated or simply-allocated func- 20 based on the interconnect flexibilities provided by the GSM 
tion signals. Pin-to-pin delay time for function signals that 380 and SSM's A-H. 

are generated with super-allocation (described below) may FIG. 3B illustrates a second CPLD monolithic device 30ff 

exceed the base delay values. The global SpeedLocking™ in accordance with the invention having similar metal line 

feature will be discussed further when we reach FIG. 8. The and transistor technology as that of the first embodiment 

difference between simple-allocation and 'super'-allocation 25 shown in FIG. 3A. Due to space limitations in the drawing, 

will be discussed further when we reach FIG. 6. the global clock lines are not shown but are understood to be 

In one embodiment, where the die size is about 12,000 present. An underlying layout for metal lines of the GSM, 

microns by 12,000 microns or less, the GSM longlines are and of MFB/IFB lines that feed into the GSM, will be seen 

each about 8,000 microns in length or less and are composed when we discuss FIG. 7C. In FIG. 3B, segments A, B, G, and 

of standard aluminum alloys. Computer simulations have 30 H are disposed to define a first square-like collection of 

shown that, even though the standard design rules for such segments above GSM 380' while segments C, D, E, and F 

metal lines allow for a minimum width of 0.5/* (half a are disposed to define a second square-like collection of 

micron) and a minimum inter-line spacing of 0.4/*, that RC segments below GSM 380'. The combination of segment 

effects on delay can be reduced when the width of the GSM layouts and the centralized GSM gives the die a general 

metal lines is increased to about 1.6/1 or more but less than 35 shape of a regular polygon (e.g., a square). This helps to 

about 2,6/*; and where the inter-line spacing is increased to minimize interconnect lengths and thereby improve signal 

about 0.5/* or more. If metal lines are made too narrow, propagation speeds. 

resistance value R dominates the RC time delay. If metal I/O pins are clustered near respective SLB's in FIG. 3B so 

lines are made too wide, capacitance value C dominates the as to minimize the general length of pin-to-SLB wires. There 

RC time delay. If inter-line spacing is made too small, 40 are some trade oris. For example, SLB A3 is deeper into the 

excessive cross-talk can occur. If inter-line spacing is made core of the chip than is SLB A4. (Note that SLB numbering 

too large, die size tends to increase, line lengths tend to runs counter clockwise in the NW quadrant of the chip and 

increase, and pin-to-pin speed suffers. Careful attention to is mirror-image symmetrical in the other three quadrants.) 

layout, the resulting line lengths, and the utilized line widths Pins cluster, I/O^A3 is placed closer to the column formed 

can help to significantly reduce pin-to-pin time delays in the 45 by SLB's A4 and A3 than pins cluster, I/0_^A4 so that, the 

overall CPLD. Of course, exact numbers will vary depend- wire length between I/0_A3 and SLB_A3 is shortened at 

ing on the details of the semiconductor, metal and dielectric the expense of a slightly longer distance between I/0_A4 

technologies employed. and SLB^A4. 

CPLD 300 comprises eight segments, respectively In one embodiment, the near-core SLB's, 3 and 4 of 

denoted as A-H, which are provided symmetrically about 50 segments B, C, F and G have all their pads buried so that 

GSM 380. Each of segments A-H has 64 I/O pads and four relatively long, pin-to-pad wires are not used for reaching 

SLB's. Some of the I/O pads may be buried ones (e.g., 32 from the chip periphery into the near-core SLB's. In this 

per segment) while the others are connected to external pins. embodiment, the following pins clusters are not present and 

Each SLB contains 32, result-storing macrocells. The illus- are therefore shown as dashed (optional) in FIG. 3B: I/0__ 

trated CPLD 300 therefore has 512 I/O pads and 1024 55 B3, 1/0_B4, 1/0_C3, 1/0_C4, 1/0_F3, I/0__F4, 1/0_G3, 

macrocells. There are 128 fully-interconnectable macrocells and I/0_G4. The corresponding, near-core SLB's with 

within each segment. If the pad burial rate is 50%, there will fully-buried pads are identified with dashed cross hatching, 

also be 256 I/O pins for the IC device. If there are 16 I/O pins for each nonburied SLB and zero pins 

There are four global clock (GCLK) pins in CPLD 300 for each fully buried SLB, then this embodiment will sport 

Two of the pins are coupled to programmably-bypassable 60 384 I/O pins (8x16 for core segments B, C, G, F; plus 16x16 

phase locked loops (PLL's) which then couple to two for peripheral segments A, D, E, H). If, instead there are 16 

chip-wide GCLK lines. The other two pins connect directly I/O pins for every SLB (no buried pads), such an alternate 

to two other chip-wide GCLK lines. The PLL's may be used embodiment will sport 512 I/O pins. Flip-chip BGA (Ball 

for frequency multiplication and/or division and/or phase Grid Array) packaging may be more appropriate for such an 

adjustment relative to chip-external clock signals. 65 alternate embodiment than a quad-side, peripheral pinout. 

Each of segments A-H may operate as an independent FIG. 5 demonstrates how N-way routing flexibility may 

and self-contained mini-CPLD that has up to 64 I/O termi- be provided in the feedback loops of one embodiment 500 
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due to the provided combinations of switch matrix sizes and the PIP creates either a unidirectional or bidirectional con- 
multiplexer sizes. However, before FIG. 5 is discussed, the nection between the crossing H and V-lines. In a second 
meanings of various symbols therein are explained by state, the PIP does not provide a connection between the 
referring to the legend 400 of FIGS. 4A-C. crossing H and V-lines. Switch 471 may defined by anyone 
Interchangeably symbol 401 demonstrates that a rect- 5 or more of a plurality of elements, such as an NMOS pass 
angle 411 with insignia of the form 'Vn' in it represents a set transistor, a CMOS transmission gate, a blowable fuse or 

412 of n parallel lines extending in the vertical (V) direction. makeable anti-fuse, one or an opposed pair of tristate 
The vertical (V) direction is that used in the respective drivers, and so forth. Configuration memory 472 can be 
drawing and does not in any way limit the direction or discrete from controllable switch 471 or an integral part of 
directions of extension of a given, actual bus even though 10 it, such as when switch 471 includes a floating gate transistor 
that bus is described herein as being ' vertical' . and the charge on the floating gate defines a configuration 

A particular one line such as 413 may serve as an memory state, 

exemplary representative of the n V-lines of a bus such as One-way interchangeability symbol 408 shows how a GIP 

411. The counterpart of the exemplary representative line (a Gate Input Point, which is represented here by a hollow 

413 is shown as 414 in the schematic at the right of 15 diamond) might be implemented by a memory controlled 
interchangeability symbol 401. An arrow may be used to switch 481. In one state, the GIP creates a unidirectional 
indicate signal direction within the exemplary line 413. The connection between a crossing signal-providing line and a 
ellipses 415 indicate that the example is understood to be gate input line (GIL). In a second state, the GIP instead 
repeated. couples the gate input line (GIL) to a Gate -input doesn't- 

Interchangeability symbol 402 demonstrates that a rect- 20 care state 'GiX*. If the gate on the output end of the GIL is 

angle with insignia of the form *Hm' in it represents a set of an AND gate, then the don't-care state 'GiX' is a logic T 

m parallel lines extending in the horizontal (H) direction. because that allows other inputs of the AND gate to define 

The horizontal (H) direction is that used in the respective its output. If the gate on the output end of the GIL is an OR 

drawing and does not in any way limit the direction or gate, then the don' t-care state 'GiX' is a logic '0' for similar 

directions of extension of a given bus that is described herein 25 reasons. Switch 481 may be defined by any one or more of 

as being 'horizontal'. Of course, when a horizontal first bus a plurality of elements, such as an NMOS pass transistor 

crosses with a vertical second bus, a corresponding set of with pull-up, a blowable fuse or makeable anti-fuse with 

crosspoints will be defined in the actual device. The pres- pull-up, an open collector driver or a tristate driver with 

enceof acrosspointby itself does not imply that an electrical pull-up, and so forth. Memory 482 can be discrete from 

connection is present there or can be programmably created 30 controllable switch 481 or an integral part of it, such as when 

at that crosspoint. However, as is explained shortly, a set of switch 481 includes a floating gate transistor and the charge 

crosspoints can be fully or partially populated by PIP's on the floating gate defines a memory state, 

(programmable interconnect points) to thereby define a One-way interchangeability symbol 409a demonstrates 

programmable switch matrix. for purpose of understanding symbolic equivalence, the 

Interchangeability symbol 403 demonstrates the equiva- 35 relationship between a Vn rectangle with a Full-Diagonal 

lence between a rectangle with insignia of the form 'Vn/Hm' symbol (FD peanut) 491 and a corresponding matrix of 

in it, and a crossing of a Vn bus with an Hm bus, which crosspoints that are populated by GIP's. The output 493 of 

crossing usually defines n times m crosspoints. AND gate 492 defines a product term (PT) of one or more 

One-way interchangeability symbol 404 shows the for- of all n signals provided by the vertical longlines (V-LL's). 

mation of a partially populated, programmable switch 40 The real or theoretical lines that cross with the V-LL's are 

matrix at the intersection of a Vn bus and a Hm bus. A sometimes referred to herein as shortlines (SL's) even 

peanut-shaped symbol such as 441 with a number in it, though SL's might be longer than their LL's. Typically, LL's 

represents an exemplary set of partially populating PIP's. In broadcast a set of available signals through an array of SL's. 

this example a horizontally-extensive pattern of 3 PIP's is PIP's or GIP's on the SL's can programmably select a subset 

repeated vertically in a staggered and wrap-around manner 45 of the LL-broadcast signals and deliver the selected subset 

so that in general, each H-line is loaded by a same respective to an array of subsequent circuits (e .g., AND gates) provided 

and horizontally-associated number of PIP's (e.g., 3) and along the longlines. 

each V-line is loaded by a respective and same, vertically- Those skilled in the art will recognize that the depiction 

associated number of PIP's (e.g., 2). to the right of symbol 409a is generally more symbolic than 

The routing function of the peanut-shaped symbol 441 50 practical. One-way interchangeability symbol 4096 demon- 
can vary based on whether signal flow is bidirectional or strates a more realistic implementation of an n-inputs AND 
unidirectional. In FIG. 4B, one-way interchangeability sym- gate. Here, the product term signal 493' is formed by a 
bol 405 shows the case where each 3:1 peanut symbol 443 wired- AND circuit having a pull-up resistor. Each of NMOS 
represents a 3-to-l multiplexer (MUX) because signal flow floating gate transistors such as 498, 499, etc. receives a 
is defined by 3 input signals (Hm') and one output signal 55 respective and pre -complemented one of the n input signals 
444. Configuration memory 445, any associated decoding of at its gate while its source is tied to ground and its drain is 
its bits, and the selection control port into the MUX are tied to pulled -up line 493'. If one of the pre-complemented 
implied by symbol 443. input signals goes high, its transistor pulls line 493' low and 

One-way interchangeability symbol 406 shows the case thereby performs the Boolean ANDing function. Charge 

where each 1:3 peanut symbol 447 represents a l-to-3 60 may be programmably and individually stored onto the 

demultiplexer (DEMUX) because signal flow is defined by floating gate of each of transistors 498, 499, etc., to define 

3 output signals (Hm') and one input signal 448. Configu- whether that crosspoint is active or not. If none of transistors 

ration memory 449 and the selection control port are implied 498, 499, etc. are active, then the pull-up resistor will pull 

by symbol 447. line 493' high to Vcc. The wired-AND function may be 

One-way interchangeability symbol 407 shows how a PIP 65 alternatively performed by placing a sense amplifier at the 

(represented by a hollow circle) might be implemented by a end of line 493'. The sense amplifier is designed to produce 

configuration-memory controlled switch 471. In one state, a logic '1* output if none of transistors 498, 499, etc., is in 
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a conductive state and to produce a logic '0' output if one or N output points to which the IN signal is specificallyseered, 

more of transistors 498, 499, etc., is in a conductive state. will receive the so-steered IN signal. 

In some instances, it may not be desirable to use a In the illustrated example of one-way interchangcability 

Full-Diagonal (FD) of crosspoint populating GIP's such as symbol 430, the two (N) output points of steering swilch 431 

implied by FD peanut symbol 491. For example, if each 5 are respectively, a first input terminal (GILq) of a fist gate 

input signal and its complement are simultaneously pre- (not shown) and a second input terminal (GIL X ) of a separate 

sented for input into a gate, then the theoretical number of second gate (not shown). Both of the first and second gates 

gate input lines (GiL's) can be cut in half because both of the (not shown) have a same, input don't care level (GaX). For 

gate input signal and its complement will generally not be example, if the first and second gates (not shown) are OR 

applied at the same time to a same AND gate or a same OR 10 gates, then GiX is a logic '0' and that becomes the default 

gate. Such a condition is illustrated in FIG. 4C, to the right output level of the corresponding steering switch 43LThus, 

of interchangeability symbol 410. Each of the illustrated, if configuration memory 432 can only select only a specific 

hollow bird symbols (421) represents a memory-controlled, one of the N output points at a time, say the first input 

3-to-l switch that couples the GiL either to supplied input terminal (GIL^), then steering switch 431 will steer the input 

signal or its complement or to a Gate -Input don't care level 15 signal (IN) to GILq while applying the don't care, default 

(Gix). The HD insignia at 495 represents such a Half -full level (GiX) to the input terminal (GILJ of the second gate. 

Diagonal condition. The output 497 of AND gate 496 can be If configuration memory 432 instead selects GILj, then the 

configured by the three-way switches (421) to be a product vice versa operation will be performed. The input signal (IN) 

of any desired ones of the supplied input signals (a, a-NOT, will be steered to GIL 2 while GiX will be applied to GIL^. 

b, b-NOT, etc.). 20 If desired, configuration memory 432 can be made larger 

Those skilled in the art will recognize that the depiction such that it can programmably-select more than one of the 

to the right of 410 is generally more symbolic than practical. N output points of the steering switch 431 while applying a 

One-way interchangeability symbol 4096 again demon- default level to the remaining of its N output points. In the 

strates a more realistic implementation. For the HD illustrated example therefore, configuration memory 432 

embodiment, additional and like -connected transistors will 25 might be organized as two bits instead of one, in which case 

typically be added onto line 493' to receive the non- memory 432 can be programmed to control each of the 

complemented signals, a, b, c, d, etc. The n number of illustrated SPDT electronic switches independently, 

vertical input lines will therefore be twice as many as the up Typically, in CPLD's where pass transistors are used for 

to n/2 number of terms that may be ANDed together by the implementing PIP's, and where configuration memory can 

circuit. This relationship between the Vn input lines and the 30 be become excessively large if some restraint is not used, the 

n/2 independent terms that may be ANDed is indicated in configuration memory 432 of a steering switch 431 is 

FIG. 4C to the left of leftmost gate symbol 496 by the limited to selecting just one of the N output points. One 

double -slash symbol and its 4 n/2* descriptor. reason why it is desirable to steer the input signal (IN) to the 

One-way interchangeability symbol 420 shows how a input terminal of only one gate at a time is so that speed can 

three-state switch 421 might be formed so that configuration 35 be maintained without having to provide too large of a signal 

memory 422 determines whether input signal 'a' or 'a-NOT' generating driver (not shown) that drives the IN terminal of 

or a don't care level (GiX) is applied to the gate input steerer 431. Another reason is to keep the size of memory 

terminal line (GiL). If the receiving gate is an AND gate, 432 relatively small. 

then GiX='l\ At least two memory bits are generally Referring to FIG. 5, the illustrated super structure 500 is 

needed to define the 3 states. 40 constituted by a Segment Switch Matrix (SSM) 550, and a 

Those skilled in the art will recognize that a pair of cooperating part of Global Switch Matrix (GSM) 580, and 

transistors such as 498, 499 in the implementation shown a plurality of Super Logic Blocks, of which only SLB 510 

below 409b can be used with a pull-up or pull-down resistor is shown. Where practical, reference numerals in the '500' 

to emulate the operation implied at 420. For example, if an century series are used in FIG. 5 to refer to elements that 

AND gate is being implemented, input signals a and a-bar 45 have corresponding counterparts in FIG. 2, where the latter 

(a-NOT) will be supplied respectively to the gates of tran- are identified by reference numerals in the '200' century 

sistors 498, 499. Three configuration memory states can be series. 

defined by disabling only 498 (receives 'a'), disabling only As such, the illustrated set 511 of eighty H -lines that 

499 (receives ' a-bar'), and disabling both of 498 and 499. If emerge from SSM 550 represent the SLB input bus 511 for 

a fourth memory state is allowed where both of 498 and 49 9 50 SLB 510. The illustrated set 522 of thirty-two MFB lines 

are enabled, then the output 493' of the AND gate will be that emerge from macrocells area 512 carry the macrocell 

forced to zero because at least one of V and 'a-bar* is zero. result signals (MFB's) of SLB 510 back to SSM 550 and 

Shown to the left of the next, one-way interchangeability also to GSM 580. Pad 516 is a representative one of the 16 

symbol 430, there is a crown-shaped symbol 431 that I/O terminals of SLB 510. A preselected subset of the I/O 

essentially represents the inverse of the operation performed 55 pads 516 may be buried if desired. The 16-bit wide bus 517 

by gate-input element 421. The crown-shaped symbol 431 corresponds to bus 217 of FIG. 2 and includes a connection 

represents a one-to: as-many-as-N-points, programmable to macrocells area (MCA) 512. IFB bus 517 merges into a 

'steering* switch that has one input point (IN) and a plurality 48-bits wide, combined feedback bus 528. Combined feed- 

of N output points (2 active ones in this example). Steering back bus 528 then merges into a 192-bits wide, combined, 

switch 431 is programmable to steer its input signal (IN) to 60 intra-segment feedback bus 529 which feeds into Segment 

at least one, programmably-selected one of its N output Switch Matrix (SSM) 550. 

points while applying a predefined default level to each of SSM 550 is divisible into a local-feedback portion 551 
the remaining of its N output points that are not specifically (fed by bus 529) and a global-feedback portion 552 (fed 
selected for receiving the input signal (IN). In other words, from the GSM by bus 585). On each H-line of SLB input bus 
those of the N output points to which the IN signal is not 65 511 there is a first, local-servicing, 8-to-l multiplexer see- 
specifically steered, will instead receive a respective default tion 553 that is provided in the cross area of bus 511 with a 
level (e.g., a GiX level). The one, or optionally more, of the VI 92 set of lines of the local- feedback portion 551. On each 
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H-line of SLB input bus 511 there is also a second, global- used, and instead one of the memory-controlled, 3-to-l 

servicing, 8:1 mux section 554 that is provided in the cross switches 421 (FIG. 4C) of each HD peanut may be replaced 

area with a V192 set of lines of the SSM global-feedback by a 4 state switch that further allows both a FT input term 

portion 552. The combination of sections 553 and 554 and its complement to be simultaneously applied to a 

defines a 16-to-l multiplexer. 5 respective pair of GiL's of the respective AND. Simulta- 

The 16 PIP's per multiplexer 553/554 multiplied by the neous application of the input term and its complement will 

80 SLB input lines (511) results in 1280 PIP's being present force a zero output. Thus approach has been discussed above 

in the cross area of H80 bus 511 and V384 bus 551/552. with respect to transistors 498 and 499. 

Because the PIP's are generally uniformly distributed in this Each of AND gates, AO through A178 produces a respec- 

cross area, these 1280 PIP's provide, on average, 3.33 ways 10 tive one of product term signals, PT 0 -PT 178 . Each respective 

(1280/384) for a given signal on V384 bus 551/552 to enter product term, PT, can represent the Boolean product of one 

SLB 510. In one embodiment, preference for routability is or more of any of the 80 SSM signals acquired by SLB input 

' given to the IFB ones of signals carried on the V384 lines of bus 511 or their corresponding complements. Each respec- 

bus 551/552 so that, for every triad of 2 MFB-carrying tive product term, PT,. can also be set to logic '0' if no input 

V-lines and 1 IFB-carrying V-line, there are 3 PIP's per SLB 15 term is selected by the respective HD structure of crosspoint 

on each of the 2 MFB-carrying V-lines and 4 PIP's per SLB area 531 and if the respective nulling PIP 501 or its equiva- 

on the 1 IFB-carrying V-line. This combination of 10 PIP's lent is activated. Each respective product term, PT,. can also 

per SLB for each corresponding triad of V-lines produces the be set to logic 'T if no input term is selected by the 

average of 3.33 ways to route a signal from an SSM local respective HD structure of crosspoint area 531 and if the 

line to a given SLB (10/3-3.33). 20 respective nulling PIP 501 or its equivalent is not activated. 

Each of the 8 PIP's in 8:1 mux section 553 is assigned a A first subset of 160 of the product terms, PT 0 -PT 159 are 

respective and mutually exclusive set of V24 lines among subdivided into 32 groups (clusters) of 5 PT's each, and are 

the lines of VI 92 bus 551. The V24 lines assigned to a supplied as such to a corresponding set of 32 OR gates, 

respective PIP is called the SSM band 557 of that PIP. (8 OR0-OR31. Each of OR gates, OR0-OR31 produces a 

PIP's per MUX_sectionxV24 lines per PIP_Band-V192 25 respective one of 32 sum-of-products signals, SoP 0 -SoP 31 . 

lines). As one steps down the H80 lines of SLB input bus Each of the SoP d -SoP 31 signals can therefore be expressed 

511, each PIP of the corresponding band shifts its position by the expressive form: 
within its respective band 557 to thereby define the corre- 
sponding 8:1 mux section 553 of that H-line. The V24 lines 

of each PIP band 557 are further divided into a V16 subset 30 

for carrying MFB signals and a V8 subset for carrying IFB (1 ° ' ' 

signals. This split matches the proportion in which SLB 510 whefe xi=afJ if the corresponding nu i ling PIP 501 is 

produces MFB and IFB signals. activated, Xi-1 otherwise, and PTi-1 if Ki is a null subset 

There are a further 640 PIP s in the cross area of H80 bus Q £ 

511 anc I V192 bus 552. Because they are generally uni- 35 560 receives ^ soP 0 -SoP 3I signals and pro- 

formly distributed, this second set of 640 PIP s provide, on duc6S , deriyed se , of 32 sums . of . sums signalSj 

average, 3.33 ways (640/192) for a given signal on V192 bus ^ ^ Allocator 560 can be programmably config- 

552 to enter SLB 510. Other specifics relating to 8:1 mux ufed tQ caus6 ^ ^ t0 be merely copies respectively 

section 554 and its respective PIP band 558 are essentiaUy of thg SoP ^.p ^ik. Alternatively, allocator 560 can 

the same as those described for elements 553 and 557. As 40 be desi d and programmably configured t0 cause each 

such, they need not be repeated here. respective SoS,. signal to follow the expressive form: 

Up to eighty, independent, SLB input signals may be 

carried by H80 bus 511 into SLB 510. The SLB input signal SoS r i^X/SoPj {Exp. c} 
on each of the H80 lines may be chosen from among a 

respective 16 of the 384 signals carried by the longlines of 45 where Xj=l if the corresponding SoP y term is to be included 

SSM 550. Each of the H80 lines is loaded by the electrical in the sum-of-sums result SoS ( - and Xy-0 if not. M usually 

capacitance of its respective 16 PIP's plus the electrical covers a range including the value J=i. Allocator 560 can 

capacitance of the one SSM longline to which one of have a wide variety of designs whose specifics are not 

multiplexer sections 553, 554 programmably couples the directly germane to the over all architecture of the CPLD. In 

SSM shortline. Each of the V384 longlines of SSM 550 is 50 one embodiment, M covers a continuous range including the 

loaded by the electrical capacitance of its 3.33 on average values J«i-3 and J=i+3 (see FIG. 6). 

PIP's per SLB (e.g., 4 PIP's for longlines carrying IFB's, 3 Each of the 32 macrocells in macrocells area (MCA) 512 

for those with MFB's) times the number of SLB's in the will typically comprise an XOR gate that can dynamically or 

given segment structure. statically define the polarity of its respective SoS,- signal. In 

The eighty, independent, SLB input signals of bus 511 are 55 accordance with DeMorgan's well known theorem, inver- 

supplied to a corresponding set of 80 complementary line sion of a Boolean sum (e.g., SoS,) causes it to appear as a 

drivers. Element 521 is an example of one such comple- product of its complemented terms and vice versa. Each of 

mentary line driver, and it has both a complementing output the 32 macrocells will typically further comprise a storage 

and a non-complementing output. The V160 output lines of unit for storing the polarity-adjusted SoS,- signal. The storage 

the 80 complementary line drivers (521) enter area 531 to 60 unit can be fixed or programmably configured to behave as 

cross with 179 HD structures. Each HD structure of area 531 anyone of, for example, a D-type flip flop with single or dual 

can supply a theoretical number of as many as 80 indepen- edge triggering, a T-type flip flop (toggling on appropriate 

dent input signals to a respective one of AND gates, AO clock edge or flat), a latch, or a transparent pass- through 

through A178. Additionally, a nulling PIP 501 may be element that passes its input directly to its output without 

provided in reality or in theory for each of AND gates, 65 substantial delay or alteration. 

A0-A178 for forcing the output its respective AND gate to FIG. 6 shows one example of a macrocell module 600 

zero. In general practice, the nulling PIP 501 will not be comprised of an input term signals acquiring means 610 
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(e.g., area 531), an AND/OR array 630, an allocator 640, and 
macrocell 650. Macrocell 650 and the remainder of module 
600 constitute a Jth one of an array of like modules that are 
sequentially numbered, as for example in the sequence, J-3, 
J-2, J-l, J, J+l, J+2, J+3, etc. 5 

The AND/OR array 630 comprises a first array 631 of five 
AND gates, A0-A4, a second array 632 of up to five 
PT-steering elements, and a first OR gate 633 which gener- 
ates a respective sum-of-products signal, SoPy, where the 
latter can be sum of as many as 5 PT's. 10 

Each of the PT-steering elements 632 is a one-to-one-of-N 
steerer (N«2, 3, etc.) which can be programmably config- 
ured to either steer its respective PTi signal to an input 
terminal of first OR gate 633, or to supply a logic '0' 
(GiXaO) to that terminal of OR gate 633. If the respective 15 
PTi signal is not steered to OR gate 633, the PTi signal may 
be instead steered to an i-th local control within the Jth 
macrocell module 600. If the respective PTi signal is not 
steered to the i-th local control, then the respective 
PT-steering element 632 provides a predefined default con- 20 
trol signal on the line 634 of that respective i-th local control. 

The exact nature of each of the local controls that can 
respectively receive, up-to-all five of the steerable PT sig- 
nals can vary and is beyond the scope of the present 
invention. By way of example though, one of the optionally 25 
re-directed PT signals that are steered through the local- 
control lines 634 of PT-steering elements 632 can be applied 
to a respective terminal 652 of a soon-described XOR gate 
651. Others of the optionally re-directed PT signals can be 
respectively applied for controlling the polarity and/or edge- 30 
sensitivity of the CLK input of storage element 660. They 
can also be applied to a mode control 659 for causing 
element 660 to function as a desired one of a D-type flip flop 
(IN-D), a T-type flip flop (IN=T), a latch (IN-L), or a 
combinatorial pass-through element (IN=C), where in the 35 
last mode, C, the IN signal of element 660 is passed directly 
to Q output 661 without intermediate storage. Another 
application filed concurrently herewith or shortly after 
describes specific embodiments of macrocell module 600 
that constitute a separate invention. 40 

The SoPy, sum-of-products signal of the first OR gate 633 
is next supplied to a post-SoP steering element 642 of 
allocator 640. The SoPy signal can be steered to a memory- 
specified one (or optionally more) of the output destinations 
of steering element 642 while the remaining output desti- 45 
nations generally receive a don't care level, typically a logic 
'0\ 

Module 600 includes a second OR gate 645 having a 
plurality of input terminals 645i for receiving SoS y input 
signals and an output terminal 645o for producing a respec- 50 
tive SoSy output signal. Inputs 645/ can come from other 
modules (e.g., J+l, J-l, etc.) as well as coming from the 
same Jth module 600. 

For a first example, assume that the only inputs 645/ of 
second OR gate 645 are those from the SoP outputs of 55 
modules J-3 through J+3. (This series includes J itself but 
pretends that the J+4 and J-4 elements of illustrated inputs 
645/ are not there for the moment.) Assume further that the 
respective post-SoP steering elements (642) of macrocell 
modules J, J+l and J+2 steer their respective sum-of- 60 
products signals, SoPy, SoP /+a , SoPy +2 » to tne input termi- 
nals 645/ of the SoSy OR gate 645 while the respective 
post-SoP steering elements (642) of remaining modules, 
J+3, J-l, J-2 and J-3 steer their respective sum-of-products 
signals, SoP,^- elsewhere. As a result, the SoSy output signal 65 
on line 645o will represent the Boolean sum of SoPy, SoPy +1 
and SoP y+2 . The SoP y+1 term, for example, is supplied from 



the (J+l)th macrocell module by line 643. A GiX signal (a 
steady logic '0') will arrive from the (J-l)th macrocell 
module in this example by way of line 644. The delay for 
producing the sum-of-threc sums result of this example, 
SoSy=SoPy+SoPy +1 +SoPy+ 2 will be simply the paraffiel gate 
delays through elements 630 and 642 of all modules plus the 
gate delay of second OR gate 645 in module J. 

Note that the SoSy output of OR gate 645 can be option- 
ally fed through one or both of post-SoS steering dements 
646 and 647 to other macrocell modules. Similarly, every 
Nth-away macrocell module such as J -4k and J+4k (where 
k-1, 2, 3, etc.) can steer its respective SoSy^ output to 
inputs 645/ of macrocell module J. 

By way of a second example, assume that the inputs 645/ 
of OR gate 645 now include not only the SoP outputs of 
modules J-3 through J+3 but also the illustrated SOS steered 
outputs of modules J-4 and J+4. (Note the subtle but 
important difference between SoS outputs 645o and SoP 
outputs 642o here. SoP outputs are produced by the Gist OR 
gate 633 of their respective macrocell module while SoS 
outputs are produced by the second OR gate 645 of their 
respective macrocell module.) 

Assume further for the second example that the respective 
post-SoP steering elements (642) of macrocell modules J 
and J+2 steer their respective sum-of-products signals, SoPy, 
SoP y+2 , to the input terminals 645/ of the SoS OR gate 645 
while the respective post-SoP steering elements (642) of 
remaining modules, J+l, J+3, J-l, J-2 and J-3 steer their 
respective sum-of-products signals, SoPy +/ elsewhere. 
Assume yet further, that a post-SoS steering element (646) 
of macrocell module J+4 steers its respective sum-of-sums 
signal, SoSy +4 to a corresponding one of the input terminals 
645/ of the SoSy OR gate 645 while the post-SoS steering 
element (647) of macrocell module J-4 steers its respective 
sum-of-sums signal, SoSy_ 4 elsewhere. As a result, the SoSy 
output signal on line 645a of macrocell module J will 
represent the Boolean sum of SoPy, SoPy+ 2 , and SoSy^. The 
sum-of-sums output, SoSy^ of macrocell module J+4 is 
thereby cascaded into an input 645/ for the SoS OR gate 645 
of macrocell module J. Unlike the first example, the delay 
for producing this result will include parallel gate delays 
through elements 630 and 642 of modules J+4 and J, plus 
serial gate delays through elements 645 and 646 of modules 
J+4 and J. However the number of Boolean sums that can be 
represented by the SoSy output signal (645o) includes the 
number of Boolean sums that can be represented by the 
SoSy^ output signal. As such, complexity of resultant sums 
can be greatly increased. The cost, of course, is the delay 
penalty of serially cascading macrocell modules such as J+4 
(not shown) and J. ~ * 

SoSy output signal (645c?) is supplied to one input of XOR 
gate 651 while a polarity control signal 652 is supplied to the 
other input. The polarity-adjusted result can be routed to the 
D-or-T-or-L-or-C input of storage/pass-through element 660 
by way of multiplexer 653. Alternatively, multiplexer 653 
can route a desired IFB signal (I/O feedback) of the SLB or 
another kind of signal to the D/T/L/C input (IN) of element 
660. In one embodiment, respective ones of the 16 IFB 
signals are fed one to each of two of the 32 macrocell 
modules such that each IFB signal can be stored in either 
selected one or both of two macrocell modules. The Q output 
of storage element 660 becomes the MFBy (macrocell 
feedback) signal 661 of the corresponding macrocell module 
J. 

The clock, reset and set terminals of storage element 660 
receive respective control signals by way of respective 
multiplexers 655, 656 and 657, each of which is configured 
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by a respective part of configuration memory. The routed The splitting of bus 528 into two bands of 24 lines each is 

clock, reset and set signals can respectively include shown at 527 merely to allude to a segment-intertwining 

G_CLK's (up to 4 of them), SLB_CLK, SLB__RST and technique that will be discussed when FIGS. 7A-7B are 

SLB_SET signals. FIG. 5 shows that these SLB_CLK, examined. In one embodiment, the 1:3 DEM UX 583 couples 

SLB_RST and SLB_SET signals can be produced by 5 to respective longlines of the GSM such as line 587 by 

respective AND gates Al 60, Al 61 and A162 as independent passing its respective, demultiplexed signal 581 through a 

PT signals. The default is a logic '0* if PT signals are not configurable multiplexer 584 that can further receive other 

so-used for respectively generating the SLB CLK, SLB_ like demultiplexed signals from the respective other 1:3 

RST and SLB_SET signals. demultiplexers (583) of other segments. The output of 

Continuing in FIG. 5, the 32 MFB result signals (bus 522) 10 exemplary multiplexer 584 is applied to a tristate longline 

of macrocells area 512 can be passed through an Output driver such as 586. While not explicitly shown in FIG. 5, it 

Switch Matrix (OSM) 570 for application to programmably- is to be understood that each GSM longline (e.g., 587) will 

selected input terminals of 16 tristate drivers 526 of the SLB generally have a plurality of tristate longline drivers such as 

510. Respective output enable (OE) terminals of tristate 586 distributively coupled to the GSM longline for driving 

drivers 526 are driven by independent PT signals produced 15 their respective signals (e.g., 581) onto the GSM longline. 

by respective AND gates A163-A178 of SLB 510. Each of Contention may be avoided on each GSM longline by 

the 16 tristate drivers 526 may have an independently enabling no more than the output of one such tristate 

configurable slew rate (control not shown). longline driver at a time for the given longline. The OE 

The illustrated OSM 570 is structured as a H32+/V16 (output enable) control terminals of the tristate longline 

partially-populated switch matrix. (The H32+ part indicates 20 drivers 586 may be controlled either statically by configu- 

that there optionally may be more than 32 horizontal lines, ration memory or dynamically by applying steered product 

as will be explained.) Multiplexer size may be in the range terms to such OE control terminals. The embodiment shown 

of 4:1 through 16:1. Thus each I/O pad 516 can have an in FIG. 7Auses static control. 

MFB signal programmably routed to it from any one of, The 384 horizontal longlines of GSM 580 (of which 587 

between 4 to 16 macrocells of the same SLB (or optionally 25 is an exemplary one) cross with the 192 vertical shortlines 

from other macrocells of other SLB's). The OSM 570 gives of bus 585. The 384 times 192 resulting crosspoints are 

CPLD configuring software flexibility in placing a particular partially-populated by 8:1 multiplexers such as 588. Signal 

function in one macrocell and then routing it to a desired routability from any given GSM line such as 587 to a desired 

output pad 516. This feature may be used for realizing SSM (e.g., 550) is therefore 192x8 divided by 384, or 

re-design PinOut-Consistency (re-design Pin-Retention™')- 30 4-ways per GSM H-line. Signal routability from any given 

A same I/O pad may be used for a given function even GSM-feeding line such as 581 to a desired segment (e.g., 

though re-design causes the CPLD configuring software to SSM 550) is the product of the 3-ways per GSM V-line 

shift the placement of the implementing macrocell within provided by DEMUX 583 times the 4-ways per GSM H-line 

the SLB. The dashed plurality of MFB fa:3 lines 523 represent provided by multiplexers 588, which gives a result of 

an optional addition of more horizontal shortlines that may 35 12-ways. In other words, when CPLD configuring software 

be added into the H32+ parameter of OSM 570 so that pad wants to route a given GSM input (e.g., on line 581) to a 

516 may receive MFB outputs from neighboring SLB's if specified segment (e.g., that of SSM 550), the software has 

desired. Of course this can disadvantageous^ increase die a total of 12 different paths to choose from. If one is already 

size and propagation time through the OSM. consumed for servicing another signal, then the software can 

Combined signal bus 528 is formed, as already explained, 40 try another path and another until all 12 are exhausted, 

by combining the 32 MFB signals of bus 522 with the 16 Thereafter, the CPLD configuring software can try 

IFB signals of bus 517 to thereby provide 48 feedback re-routing the signal to another GSM input (e.g., the next 

signals (MFB+IFB) per SLB. Each bus 528 of each of four one after line 581). This re-try can including shifting non- 

SLB's in a segment (201 in FIG. 2) are combined to define critical signals (where speed is not as important) through 

the 192 lines of bus 529. Bus 529 feeds into the V192 bus 45 OSM 570 and IFB bus 517. Note that OSM 570 provides 

551 of SSM 550. V192 bus 551 can therefore simulta- additional routing flexibility. This factor multiplies with the 

neously carry all the feedback signals (MFB+IFB) of the 12-ways provided through the GSM when IFB signals are 

four SLB's 210-240 of its segment. Intra-segment commu- being considered and Pinout-Consistency is not needed, 

nications can therefore be provided at the full 100% level such as when an IFB signal moves through a buried pad 516. 

irrespective of what happens at the inter-segment (global) 50 Thus it is seen that CPLD configuring software is given a 

communications level. In other words, each segment can relatively wide range of routing possibilities by the N-way 

operate as its own, fully contained and independent mini- routing options provided in one or both of the local feedback 

CPLD. The partially-populated cross areas of each of four loops (529) and global feedback loops (585). Moreover, 

SLB input buses (511) and the local V192 bus 551 gives because a consistent, same first delay can be provided in the 

each locally-generated feedback signal (MFB or IFB), on an 55 LI loops (529, 551) and a respectively consistent same 

average, 3.33 ways of being fed from any one SLB to the second delay can be provided in the L2 loops (580, 585, 

same or another SLB of the same segment (201). That means 552), it is relatively easy for the CPLD configuring software 

at least 3-ways for every signal and 4-ways for some signals to provide Speed-Consistency. 

(e.g., IFB's). This gives CPLD configuring software a wide External signals can be fed into the CPLD from the pins 

latitude in picking routings for intra-segment communica- 60 of nonburied ones of pads 516. The input path of such 

tions and helps to shorten the job completion time of the externally-supplied signals can be purely intra-segment, 

CPLD configuring software. such as moving from pad 516, through input buffer 536 and 

Each of the 48 lines of bus 528 further feeds into a through IFB bus 517 directly to macrocells area 512. For the 

respective 1:3 demultiplexer on GSM 580. Peanut symbol embodiment of FIG. 6, this direct path 517 into MCA 512 

583 represents one such 1:3 demultiplexer among a plurality 65 continues into multiplexer 653. The externally-supplied sig- 

of like but staggered demultiplexers. Line 581 represents an nal can then be temporarily stored in element 660 for 

exemplary, GSM-feeding line among the 481ines of bus 528. synchronization with a chip-internal clock (655) or it can be 
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passed through asynchronously onto MFB bus 522 if storage I/O pads (516). Symmetry within the design of each segmen t 

element 660 is in one of the latch (L) or combinatorial (C) allow for more finely-granulated implementations such as 

modes. for 32 or 16-bit wide designs. 

The input path of an externally-supplied signal can addi- a convenient migration path is therefore provided by one 

tionally or alternatively be a global one (inter-segment). The 5 unified architecture for implementing 16-bit wide designs 

signal can propagate from pad 516, through input buffer 536 ( e g b> bus i22 of FIG, 1), and/or 32-bit wide designs (e.g., 

and through a GSM-feeding line 581 into the Global Switch 5us U7 of pj G ^ and/or 54.^ wide des j gns ( e . g<> b us 112 

Matrix 580. From there, it can be broadcast into any one or 0 f pjQ 

more segments, as desired, by way of the 192 per segment, FIG. 7A illustrates one subsystem 700 that may be used 

. mu lp exers for distributively multiplexing the segment output signals of 

A summarizing review of FIG. 5 shows that, in accor- 0 , ; *■ 1 i- * .u ^c»i u-i 

dance with the invention, an improved and scalable CPLD 8 sclents onto respective longlines of the GSM while 

architecture has been developed that features a two-tiered the 1:3 demultip exer function of element ^583 

hierarchical switch matrix construct which helps CPLD ^ e ^ e ^P IaI 7 T G ^ M 

configuring software to provide Speed-Consistency and/or 730 (GSM_LL.l, GSM_LL.2 and GSM_LL.3 

PinOut-Consistency. The two-tiered hierarchical switch 15 respectively). Line 781 corresponds to 581 of FIG. 5. As 

matrix construct has a Global Switch Matrix (e.g., GSM *™ a - GSM-feeding line 781 has 3 drop points into sub- 

580) and a plurality of Segment Switch Matrices (e.g., SSM svstem 700 > respectively identified as drop points 781a, 

550). Coupled to each SSM is an even plurality of at least 7Slb and 781c * 

four programmable logic blocks (e.g., SLB 510, see also ^ ^ GSM ^ have a lotal of 3 dro P P° mts P er 

210-240 of FIG. 2). Each SSM and its even number of 20 GSM-feeding line (781) times 192 such feeding lines per 

SLB's define a segment (201) that couples to the GSM for segment times 8 segments (for the embodiment of FIG. 3). 

both injecting SLB result signals 522 and/or I/O pin input lar S e number can be factored as 384 times 12. Given 

signals 517 into (528, 581) the GSM and for extracting (588) ^ there are 384 longhnes in the GSM of the illustrated 

globally-provided signals (585) from the GSM for input embodiments, each GSM longline will need to service 12 

(511) by way of the SSM 550 into each SLB of a given 25 dr0 P P oints - 

segment It is desirable to distribute the 12 drop points of each GSM 

Each SLB has at least 80 complementable inputs (521) longline such that all 8 segments (A-H) are serviced uni- 

and can generate product term signals (PT's) that are Bool- forml y and ^ minimal delay. The number, 8 does not 

ean products of as many as 80 independent input terms. With divide integrally into 12. However, the number, 4 does. If the 

use of allocation (560, 640), large sums of such large PT's 30 8 segments (A-H) of the illustrated embodiments are 

may be produced in each SLB. Each sumof-products signal re-grouped as pairs, then we obtain 4 such pairs. Any pairing 

(MFB 522) can take on the expressive form: arrangement may be used. Because in FIGS. 3A-3B, seg- 
ment A is vertically aligned with and adjacent to B, C is 

/^--(»-^80/(192/8I^ + 192^Global)) {Exp. Bl'} {Q ^ afld ^ ^ the following pairings were 

Some of the product terms generated within each SLB are 35 made: A/B, C/D, E/F and G/H. An alternate embodiment of 

dedicated to SLB-local controls such as SLB-wide clock, set FIG. 3A might use the following pairings: A/C, B/D, E/G 

and reset controls (A160-A162) and such as I/O drive and F/H. 

enable controls (A163-A178). In FIG. 7 A, four 3:1, tri -stateable multiplexers are used 

Each SLB has at least 32 macrocells and at least 16 I/O for each GSM longline. GSM_LL.l (710) for example, is 

pads (buried or nonburied) which feedback to both to the 40 serviced by the respective, 4 tri-stateable multiplexers, 711, 

local SSM (by way of path 528) and to the global GSM (by 712, 713 and 714. GSM_LL.2 (720) is serviced by respec- 

way of path 527). Each SSM has dedicated for intra-segment tive multiplexers, 721, 722, 723 and 724. GSM_LL3 (730) 

communications, at least as many longlines (48x4) as there is serviced by respective multiplexers, 731, 732, 733 and 

are macrocells (32x4) and I/O pads (16x4) in the segment, 734. And so on. 

thereby assuring that every macrocell signal (MFB) and I/O 45 Each of the 3:1, tri-stateable multiplexers (e.g., 711) has 

signal (IFB) can be simultaneously transmitted through the two configuration memory bits for defining a corresponding 

SSM. Each SLB has at least 3 ways (553) of transmitting a set of four states, namely, (1) select input 0.1 for output, (2) 

feedback signal through its local SSM and then back to select input 0.2 for output, (3) select input 0.3 for output, and 

either itself or another SLB of the same segment. CPLD (4) cause the output to go into a high-impedance (HI-Z) 

configuring software is thereby given good flexibility to 50 state. The use of four 3:1, tristateable multiplexers with two 

route intra-segment signals. configuration memory bits each was a design choice which 

Each SSM further has, as dedicated for inter-segment increases the number of configuration memory bits beyond 

(global) communications, at least as many longlines (552) as the minimal needed in order to gain faster signal propagation 

there are macrocells and I/O pads in the segment, thereby speed. In one embodiment, propagation time through the 

assuring that every macrocell signal (MFB) and I/O signal 55 GSM, including through subsystem 700 is about 2 to 25 nS. 

(IFB) can be simultaneously transmitted through the GSM In the same embodiment, intra-segment signal propagation 

from one segment to another segment. The GSM has at least time from one pin to another is about 7 to 7.5 nS. Referring 

as many longlines for inter-segment (global) communica- to momentarily FIG. 8, if global routing is to be used 

tions as do two SSM'S. Thus 100% inter-segment (global), (through the GSM 580) to connect a pin 516a of a first 

unidirectional communications may occur simultaneously 60 segment (e.g., A) to an SLB (e.g., 510H) in a different 

between each of two pairs of segments. In other words, four segment (e.g., H), then pin-to-pin propagation time can still 

segments may be fully intercoupled through the GSM on a be as low as about 9 to 10 nS or less, 

pair-wise basis. Each SLB has at least 12 ways of transmit- An alternate embodiment for FIG. 7A can use a 13 state, 

ting a feedback signal (MFB or IFB) through the GSM and tristateable multiplexer having four configuration memory 

then back to either its own segment or another segment. 65 bits instead of the 8 consumed by, for example multiplexers 

The 80 parallel inputs (511) of each SLB ease implemen- 711-714. The first 12 states would select a respective one of 

tation of 64-bit wide designs. Each segment has at least 64 the 12 drop points while the 13th state would be the HI-Z 
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output state. However, signal propagation time through such corresponding ones of the longline-driving multiplexers 

a 13 state, tristateable multiplexer might be longer. More (e.g., 711-714) of each GSM longline (e.g., 710). This may 

specifically, FIG. 7C shows how a distribution of four 3:1 be seen in the layout of FIG. 7C. 

multiplexers (with HI-Z as a fourth state for each) can help FIG. 8 shows a usage of the embodiment of FIG. 5 where 

to reduce the length of lines for bringing SLB output signals 5 a pin 516a (and corresponding nonburied pad) in segment 

down into the GSM 780. One exemplary GSM line 710' has SEG_A provides an input signal for an SLB 510H in 

multiplexers 711' and 712' respectively feeding it from different segment SEG„H. Signal propagation is through 

segment pairs A/B and C/D. Another exemplary GSM line pin 516a, then the corresponding input buffer 536a of that 

720* has multiplexers 723* and 724' respectively feeding it pad, then through a corresponding GSM-feeding line 581a, 

from segment pairs E/F and G/H. 10 the GSM 580, and out onto bus SSSh into SSM 550H of 

Wires 727' correspond to the band-intertwined bus 527 of segment SEG_H. From SSM 550H, the signal continues 

FIG. 5. Bundle 727 is also seen as coming from the C/D pair along SLB input bus Sllh into respective SLB 510R A 

in FIG. 7A. corresponding IFB result signal is then passed through pad 

As should be obvious from FIG. 7 A, each of multiplexers driver 526/i to pin S16h. 

711-714 services a respective one of the four segment pairs, 15 As already explained, the difference in delay between 

A/B, C/D, E/G and G/H. Line 781 obviously comes from the acquiring external input signals from an out-of-segment pin 

A/B segment pair. For multiplexer 711, two other lines of the such as 516a instead of from an intra-segment pin is that of 

A/B segment pair attach to the drop points designated as propagating the signal through GSM 580. Otherwise, for the 

A/B.2 and A/B3. For multiplexer 721, two other lines of the balanced local/global design of FIG. 5, the delay through the 

A/B segment pair attach to the drop points designated as 20 out-of-segment input buffer 536a is essentially the same as 

A/B .4 and A/B.6. For multiplexer 731, two other lines of the it would be for an intra-segment buffer, and the delay 

A/B segment pair attach to the drop points designated as through the global portion (552) of SSM 550H is essentially 

A/B.7 and A/B.8. the same as it would be through the local portion (551). 

It is possible that some GSM longlines may go unused. Thus, is the pin-to-pin delay for processing signals only 

For example, if all of multiplexers 711-714 are placed in the 25 within a segment is about 7.5 nS or less, and the propagation 

HI-Z state, line 710 might float and propagate noise through delay through GSM 580 is about 2.5 nS or less, then the 

the chip. To prevent such an undesired condition, weak combined delay for the circuit configuration shown in FIG. 

latches such as 715, 725, 735 are coupled to the respective 8 is SpeedLockable one of about 10 nS or less, 

longlines. The outputs of these weak latches (715, 725, 735, There are two more features to be associated with FIG. 8, 

etc.) are designed to be easily overcome by either a logic *0' 30 PinOut-Consistency and pin/pad borrowing. Let us assume 

or logic 'I' output of one of the multiplexers (e.g., 711-714) that pin 516ahas already been specified to provide a 

of the respective GSM longline. The input inverter of each particular, board-level function. The pinout location for that 

such weak latch has enough gain to ensure bi-stable opera- frozen function cannot be changed anymore in this example, 

tion. If desired, the output inverter of each such weak latch However, the CPLD configuring software finds that the best 

may be individually skewed to favor latching towards 35 placement for an SLB which is to use the signal of pin 516a, 

ground or towards Vcc. The skewing selection depends on is in segment SEG_JL 

which of the GND and Vcc buses is better suited at that local The placement does not have to be specifically at SLB 

in the chip for absorbing switching noise. position 210H (see FIG. 2). It could be at 220H, 230H or 

FIG. 7B illustrates an intertwining technique that may be 240H. Element 516h may be either a pin or a buried pad (no 

used to pair together bands of MFB/IFB output lines from 40 connection to a pin) in this example. Because 12-way 

each segment pair. In the illustrated embodiment 780, the routability is available from the pre-specified pin 516a and 

192 GSM-feeding lines of each segment (A-H) are divided through GSM 580 to get to SSM 550H, the CPLD config- 

into bands of 24 lines each. A first V24 output band, Bl of uring software should have a relatively easy task of finding 

segment A is laid out adjacent to a corresponding first V24 a workable placement and routing solution. If 516/j is a pin 

output band, Bl of segment B so as to define a V48 45 rather than a buried pad in this example and the location of 

intertwined band. In a slightly varied embodiment, a first 3 pin S16h is also frozen, then obviously the degrees of 

wire bundle is taken from SLB_A1 and laid out adjacent to freedom that the software has will be reduced. Nonetheless, 

a second 3 wire bundle taken from SLB_A2, where the the CPLD configuring software should have a relatively easy 

latter is laid out adjacent to further 3 wire bundles from task of finding a workable routing solution. 

SLB_B1 and SLB_2 so as to define a first quarter (12 50 Still referring to FIG. 8, let us assume that segment 

wires) of the V48 intertwined band. The other three quarters SEG_H needs to participate in a one -pass operation that 

repeat a similar pattern and may take their 3 wire bundles calls for receipt over SLB input bus Sllh of 80 external 

further from SLB_A3, A4, B3 and B4. There are 4 such input signals from a corresponding 80 pins and that each 

V48, intertwined output bands per segment (4x48=192). The segment has 64 nonburied pads. For this example, the 

GSM is also subdivided into bands of 48 H-LL's each. There 55 desired output from SLB 510H is a MFB, not an IFB. Thus 

are 8 such H48, signal-receiving bands (8x48-384). GSM pin S16h is free to function as an input pin. The local portion 

longlines are designated as LL.O through LL383 here. 551 of SSM 550H can collect 64 external input signals from 

Signal-receiving band Bl for example, is formed by GSM the corresponding 64 pins (including pin S16h) of its seg- 

longlines LL.O through LL.47. As will be seen in FIG. 9 A, ment SEG_H. Up to sixteen additional pins can be 'bor- 

these GSM longlines may be seen to formed different bands 60 rowed' from one or more other segments using the technique 

for output purposes. shown in FIG. 8. Thus all 80 SLB input lines of bus Sllh can 

PIP's (not shown) are distributed in the cross areas of the simultaneously receive external input signals so that SLB 

V24 and H48 bands of FIG. 7B to implement the distributed 510H can process them in one pass, 

multiplexing functions shown in FIG. 7A. If it were not possible to route the 80 external input 

The band intertwining technique helps to reduce the 65 signals to SLB 510H for one-pass processing, then the 

length of interconnect lines needed for transmitting GSM- implementation might have to use two or more processing 

feeding signals (e.g., 581) from respective segments to passes to generate the desired output function. This can 
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significantly slow the throughput of the design- of distributing 8 PIP's across two assigned, GSM output 
implementing CPLD. The exemplary architecture of FIG. 3, bands is repeated three more times with each of the repeats 
however, can transfer as many as 80 external input signals being vertically distributed differently. In the illustrated, 
per receiving segment to as many as 24 receiving SLB's simple example, the differently distributed PIP's are shown 
(that is, six receiving segments and two, pin donating 5 as being staggered to cover mutually exclusive, subsets of 4 
segments) by using all 384 GSM longlines for pin borrowing of the 16 horizontal lines of respective GSM output bands 

(16x24-384) and by borrowing 96 pins (6x16-96) from the Bl and B2. 

remaining, pin donating segments. More specifically, if each FIG. 9C shows a more microscopic view of one such, 
of segments A, B, C, D, E and F needs a mutually exclusive simplistic embodiment in which multiplexer 901' is defined 
set of 80 I/O pins for receiving input signals, then segment 10 by PIP's placed on the intersections of GSM shortline, SL.0 
G can donate 64 of its I/O pins (assuming they all its pads with GSM longlines LL.O through LL.3 and LL.16 through 
are nonburied) and segment H can donate 32 of its I/O pins. LL.19. GSM shortline SL.0 extends to become a like- 
Segment H would then still have 32 I/O pins remaining for numbered longline of the corresponding SSM global section 
pumping out result signals. While this is an extreme, it 552 (see FIG. 5). 

demonstrates the robustness of the CPLD architecture dis- 15 In FIG. 9B, the V4 structure which includes multiplexer 

closed herein. 901 is repeated one more time to thereby produce a 

FIG. 9A shows a banding scheme 985 that may be used generally-similar, second V4 structure that is referenced as 

for distributing GSM-carried, global signals back to respec- 904 and has its respective distributions of 8 PIP's per SL, 

live SSM's of different segments. In the illustrated embodi- where the 8 PIP's are restricted to GSM output bands Bl and 

ment 985, the 192 SSM-feeding lines that extend from the 20 B2. The resulting V8 lines and PIP's (not shown) of these 

GSM to each segment (A-H) are divided into bands of 24 two V4 structures is referenced as 905. One, simplistic 

lines each. A first V24 input band, Bl of segment A is laid version of this V8 structure is shown as 905' in FIG. 9C and 

out adjacent to a corresponding first V24 input band, Bl of is seen to uniformly but partially, populate the intersections 

segment B to thereby define a V48 intertwined band. There of GSM LL.O through LLJ1 with GSM shortlines, SL.0, 

are 4 such intertwined input bands per segment (4x48=192). 25 SL.8, SL.16, SL.24, SL32, SL.40, SL.48, and SL.56. Note 

(A slightly different layout is shown in FIG. 9E and dis- that only every eighth shortline is shown here and that other 

cussed below.) The GSM is also subdivided into bands for shortlines, e.g., SL.1, SL.2, etc. populate the regions in 

purposes of distributing its global signals to the SSM's. The between. The distribution of the band Bl and B2 PIP's on 

GSM output bands have 16 H-LL's each. There are 24 such every eight line helps to space such B1/B2 populating PIP's 

H16, signal-sourcing bands (24x16-384). GSM longlines 30 apart from one another. PIP's that populate other bands, e.g., 

are again designated as LL.0 through LL383 here. Signal- B3/B5 are interposed. See for example, SL1 of FIG. 9D. 

sourcing band Bl for example, is formed by GSM longlines This allows SL (shortlines) to be packed closely together 

LL.O through LL.15. without having the widths of respective PIP elements col- 

PIP's (not shown) are distributed in the cross areas of the liding into one another. 

V24 and H16 bands of FIG. 9A to implement a distributed 35 The next V8 structure in FIG. 9B is referenced as 908 and 

multiplexing function that is next described with reference is seen to have PIP's that populate only GSM output bands 

to FIG. 9B. The band intertwining technique helps to reduce Bl and B3. The next such V8 structure is then seen to 

the length of interconnect lines needed for transmitting populate only GSM output bands B2 and B4, and so on. 

GSM-sourced signals (e.g., 585) from the GSM to respec- Output bands B23 and B24 are covered by the last such V8 

tive segments. 40 structure of the sequence. 

A further technique for reducing the length of intercon- In the version represented in FIG. 9C, the PIP's of the V8 

nect lines needed for transmitting GSM-sourced signals to multiplexing structure that covers output bands Bl and B3 

respective segments is shown in FIG. 9E and will be are broadly referenced as 908'. Note that every eighth 

discussed below. In FIG. 9 A, a plurality of 8:1 multiplexers shortline is again shown here in the sequence SL.64 through 

are defined and distributed relative to GSM and SSM lines 45 SL.120. 

to reduce total line length and to provide CPLD configuring In the version continued in FIG. 9D, the PIP's of the V8 

software with uniformly distributed routing options. The multiplexing structure that covers output bands B2 and B4 

distributed, multiplexer-based routing of signals from the are broadly referenced as 912'. Note that every eighth 

GSM lines into SSM lines may be seen at a gross level in the shortline is again shown here in the sequence SL.128 

layout of FIG. 7C. SSM's A, B, C, D are vertically aligned 50 through SL.184. After wrapping around beyond SL.191, the 

in this embodiment with perpendicularly-extending output next unused shortline is SL.1, which begins the every-eighth 

lines of first set of 8:1 multiplexers. SSM's E, F, G, H are shortlines sequence for the next V8 multiplexing structure 

vertically aligned in this embodiment with similar output that covers output bands B3 and B5. This set of PIP's is 

lines of second set of 8:1 multiplexers. Distance from the broadly referenced as 916'. It is understood that the general 

horizontal lines of the GSM to the vertical lines of the 55 pattern continues in this manner to thereby intermingle the 

SSM's is thereby minimized. GSM sourcing points with the SSM signal-receiving lines. 

FIG. 9B shows more detail for implementing one version The regularly-replicating pattern of PIP's which is shown 

of the SSM-feeding 8:1 multiplexers. Line 901 is a single in FIGS. 9C-9D turns out to be less-than ideal for routing 

output line of one such 8:1 multiplexer. A first subset (e.g., randomly-defined connections between the GSM and a 

four) of the 8 corresponding PIP's are distributed over a 60 targeted SSM. By way of a very simple example, assume 

corresponding subset of the 16 horizontal lines of GSM that a nibble -wide first signal (4 bits wide) appears on 

output band Bl. A second subset (e.g., four) of the 8 LL.0-LL.3 for transfer into the segment serviced by way of 

corresponding PIP's of the same MUX 901 are distributed V8 multiplexers 905' and 908'. If the non-random PIP's 

over a corresponding subset of the 16 horizontal lines of placement of FIG. 9C is used, then the only opportunities for 

GSM output band B2. (The illustration showing 4 PIP's in 65 the 4 bits on LL.(KLL.3 to transfer into the segment via 

each GSM output band is just a simple introduction to a multiplexers 905' and 908' is on the four specific shortlines, 

concept that will be further elaborated below.) This pattern SL.0, SL.32, SL.64 and SL.96. There is a finite probability 
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however, that one or more of lines SL.O, SL.32, SL.64 and generated by a computer program. The algorithm operates to 

SL.96 have already been 'consumed ' by the CPLD config- provide the PIP placements with an orthogonal correlation 

uring software for transferring a GSM band B2 or B3 signal between GSM longline numbers and shortline numbers, 

into the same segment. That finite probability can be disad- Consider for example, LL.0-LL.3 of FIG. 9C Their PIP 

vantageously enhanced by the possibility that another nibble 5 placements correlate 100% when any randomly-chosen two 

of data is coming in longlines LL.16-LL.19 or of longlines LL.0-LL3 are tested. 

LL.32-LL.35. In such a hypothetical case, the non-random Our goal is approach a 0% correlation for such pair-wise 

repetition of 4 adjacent PIP's in each band increases the testings of PIP placements. If there is a PIP placed at the 

likelihood of contention for same shortlines (a collision). intersection of longline LL.a (a is an arbitrary number here 

One of those N-bit wide signals will not be able to get between 0 and 32) and shortline SL.8, there should not be a 

through. PIP placed at the intersection of longline LL.b (b is an 

Referring again to FIG. 9B, consider now a more random, arbitrary, different number here between 0 and 32). Perfect, 

signal transference problem. Say there are 8 separate, GSM- 0% correlation is not likely. Instead, we add up the corre- 

sourced signals which are randomly assigned each to one of lation values for all permutations of longlines a and b and 

the 32 longlines of GSM output bands Bl and B2. The then seek a PIP's distribution pattern that minimizes that 

simplistic V8 structure 905' of FIG. 9C has 8 SL lines 15 addition result. 

(vertical lines) for outputting such a supplied set of 8 The problem is symmetrical for shortlines or longlines. 

sourced signals. But the V8 structure 905' is not fully There are fewer shortlines per segment than longlines and 

populated with PEP's. There are 256 crosspoints (32x8) but thus the number of permutations to be considered is much 

only 64 PIP's. That's a populating percentage of only 25%. less if we seek to minimize PIP placement intersections 

If we are lucky, the 8 sourced signals will come in a 'good* 20 (matchings) and to maximize unions (non-matchings) in 

subset of longlines so that each one can get out by way of pair-wise considerations of all shortlines of a given segment, 

a respective one of the just 8 output lines. But what is the More specifically, in FIG. 9C, the pair-wise intersections 

chance of that happening? Not very good. Stated more test for SL.O and SL.32 will produce an outcome of 8 

specifically, what are the chances that those 8 separate matches. That is an undesired result. If we slide one of the 

signals will distribute as 50% and 50% respectively across 25 PIP's of SL.0 and SL.32 to a new position within bands Bl 

bands Bl and B2 so that the limited number and or B2, that will desirably reduce the matches outcome to 7. 

symmetrically -placed PIP's in the Bl and B2 parts of Contrastingly, the pair- wise intersections test for SL.0 and 

simplistic V8 structure 905' (FIG. 9Q can pick them up and SL.8 will produce an outcome of 0 matches. That is a highly 

output all of them? desired result. If we slide one of the PIP's of SL.0 and SL8 

With a little thought, it can be quickly appreciated that 30 to a new position within bands Bl or B2, that might 

randomly disbursed ones of GSM-sourced signals are not undesirably increase the matches outcome from 0 to 1. The 

likely to split exactly in numbers as 50% and 50% respec- latter outcome is still much better than an 8. 

tively across bands Bl and B2. Instead, the more likely The randomizing algorithm can operate as follows. One 

outcome is that one of bands Bl and B2 will receive more starts with a PIP's placement pattern that is chosen at 

than 50% while the other receives less. What will happen 35 random as a seed and conforms to the restriction of 8 PIP's 

then if we shift the population distribution of PIP's in the per SL where the PIP's of each SL are confined to the 

simplistic V8 structure 905' of FIG. 9C so that, for some SL assigned GSM-output bands of that SL. The pair-wise 

lines, band Bl has more PIP's than band B2, and so that, for intersections tests for all unique permutations of SL.a and 

some other SL lines, band B2 has more PIP's than band Bl? SL.b are carried out and the results are added and saved. 

That will help to relieve the band-overloading problem. In 40 Next a PIP slide is attempted and the intersections tests are 

essence and conceptually speaking, PIP's are being slid repeated to see if the intersections sum has been reduced by 

from or borrowed from one band to the other. In practice, the the trial slide. If yes, it is kept and a next slide is tried. If no, 

PIP's do not slide. They are fixed to the location they are it is generally discarded and a different slide is tried, 

placed in. The sliding from one band to another occurs Sometimes the algorithm may get 'stuck' if all non- 

during the design stage, when we are considering where to 45 improving slides are discarded. A subroutine keeps track of 

fixedly place each PIP. whether the inner loop has fallen into such a snick mode. If 

Because the disbursement of GSM-sourced signals is yes, it instructs the inner loop to keep some of its bad slides 

likely to be random, the conceptual sliding of PIP's from one in order to migrate out of the saddle area. Numerous 

band into another should also be random. PIP's patterning algorithms of this kind are known in the art and thus do not 

should be such that it improves the likelihood of successful 50 have to be detailed here. The resulting PIP's placement 

routing into a target segment. Rather than having a regular should have a relatively minimized intersections sum and a 

patterning of PIP's, the better patterning approach tries to relatively maximized unions sum. This will help to get 

randomize the locations of PIP's both across GSM bands random distributions of GSM global signals into the SSM of 

and within each GSM band so as to thereby minimize the a desired segment. 

likelihoods of routing collisions and/or band overloading. 55 While we have described the result as that of PIP's being 

The total number of PIP's per SL line still remain as 8, but slid from, or borrowed from, one. GSM output band to 

the number of PIP's per assigned, GSM output-band will another, it is to be understood that each of the horizontally 

vary from one SL line to the next. disbursed PIP's has a shortline associated with it. Thus, at 

Because the desired PIP's distribution is a random one, the same time that a PIP is being conceptually 'slid' in the 
there is no easy way to show such a PIP's distribution 60 vertical direction to be loaned from one GSM output band to 
pattern in the drawings. The human eye will not be able to another, the corresponding shortline may be conceptualized 
discern a regular pattern such as that of the less-desirable, as being 'slid' in the horizontal direction to be loaned from 
diagonal blocking pattern seen in FIGS. 9C-9D. Also, there one GSM output band to another so that for a given locale, 
is no one random distribution of PIP's that is better than all a particulary one GSM output band will have more short- 
others. 65 lines than the average number of shortlines allocated per 

The best way to describe the desired, PIP's distribution GSM output band and a competing GSM output band will 

pattern is to explain that such patterns are to be typically have a fewer number of shortlines than the average number. 
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More specifically, FIG. 9B shows for example that a V8 
structure such as 908 provides, on average, 4 shortlines per 
GSM band for routing the up-to-16 signals of each of its 
corresponding, HI 6, GSM output bands up to a respective 
SSM. Thus, on average, a V8 structure such as 908 provides 
4 shortlines for GSM band OUT-B1 and 4 more 4 shortlines 
for GSM band OUT-B3. 

Suppose, however, that one of the the corresponding, 
GSM output bands, let's say OUT-B3, is sourcing 5 or more 
signals for routing up to the respective SSM of exemplary 
V8 structure 908. Clearly, the average number of, 4 short- 
lines allocated to output band OUT-B3 cannot do the job. 
However, under the randomly sliding PIP's algorithm, there 
is a good likelihood that somewhere within the GSM-to- 
SSM crosspoints area, there will be a V8 structure such as 
exemplary 908, that will have borrowed additional short- 
lines from one of its GSM bands to thereby provide more 
than the average number of shortlines to the other of its 
GSM bands. The CPLD configuring software needs merely 
to locate such a V8 structure and use its above-average 
number of shortlines to route a like, above-average number 
of within-GSM-output-band signals to their targeted SSM. 

Contrastingly, and for sake of efficiency, there may be 
another GSM output band, let's say OUT-B1, that is sourc- 
ing 3 or less signals for routing up to a targeted SSM. 
Clearly, the average number of, 4 shortlines allocated to 
output band OUT-B1 is more than enough to do the job. In 
fact, it is too much; and if so used, it will constitute a 
wastage of resources. 

However, under the randomly sliding PIP's algorithm, 
there is a good likelihood that somewhere within the GSM- 
to-SSM crosspoints area, there will be a V8 structure such as 
exemplary 908, that will have loaned shortlines to another of 
its GSM bands to thereby provide less than the average 
number of shortlines for the lending GSM band. The CPLD 
configuring software may choose to locate such a V8 struc- 
ture and use its below-average number of shortlines to route 
a like, below-average number of within-GSM-output-band 
signals to their targeted SSM. This will help to increase 
resource-usage efficiency and will assist in routing of signal 
groups that need an above-average number of shortlines to 
get to their targeted SSM. 

Referring to FIG. 9E, another aspect of getting GSM 
global signals into the SSM of a desired segment has to do 
with wire lengths. In the embodiments of FIGS. 3B and 7C, 
the A segment is positioned directly above the B segment, D 
is below C, and so forth. A bundle of 192 wires has to be 
snaked from the GSM to the SSM of each respective 
segment. For SSM_B that is no problem. The wires (SL__ 
B.O through SL_B.191) can be extended vertically, directly 
up as a full bundle 993 from the GSM into SSM_B. 

For SSM_A* however, the Segment Switch Matrix of 
segment B is in the way. The 192 wires for SSM_A may be 
snaked around as a full bundle by extending alongside 
SSM^A, then bending-in (994) and then bending-up (996) 
into SSM_A. The more preferred approach, however, splits 
the segment-feeding wires (SL_A.O through SL._A.191) 
into two, equal-sized and symmetrical sub-bundles, 991 and 
992. Sub-bundle 991 (SL_A.O through SL_j\.95) extends 
along the left side of SSM_B and then bends in and up to 
enter the left side of SSM_A. Sub-bundle 992 (SL_A.96 
through SL__A.191) extends along the right side of SSM_B 
and then bends in as indicated at 995, extends horizontally 
by distance 995 and then bends up as indicated by 996 to 
enter the right side of SSM_A. 

The split sub-bundles approach has the advantage that 
each 96-wire wide sub-bundle (991 and 992) extends 
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horizontally, such as at extension 995, for a shorter distance 
than would have been needed if a full 192-wire wise bundle 
followed the same route. In the latter case, extension 995 
would have had to stretch the full width of SSM__A whereas 
in the illustrated case it does not need to stretch more than 
half the width of SSM_A. Also, because the sub-bundles 
991 and 992 are each narrower than a full bundle (e.g., 993), 
the average arc lengths of inward bends such as 994 and 
upward bends such as 996 tend to be smaller. Further, 
because the sub-bundles 991 and 992 are each narrower than 
a full bundle, separation between the SLB's and SSM'scan 
be reduced. Thus, to the extent possible, wire lengths are 
minimized. 

Referring back to FIG. 1, it is seen in a summarizing 
review of the above that a robust CPLD architecture has 
been disclosed for efficiently adapting to the control over- 
head needs, pinout needs, and speed requirements of designs 
whose parallel address and/or data paths are 16-bits wide, 
32-bits wide, or 64-bits wide. Board level designs can be 
provided in which CPLD glue or other logic exhibits 
re-design Speed-Consistency, and/or re-design PinOut- 
Consistency, and/or the ability to implement in one pass, the 
generation of complex function signals. Such complex func- 
tion signals can be expressed by the expressive form: 

fsor,r oS ^gn'Z"' S *XH^'^ LlA<n} * t ' 2 * Q2} ) {Exp. B2} 



where f* 



can be either a Boolean sum-of-products 
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(SoP) or a Product of Sums (PoS) depending on whether the 
polarity controlling, sign value Sgn is a NOT function (-1) 
or a pass-through function (+1). In Exp. B2, the sum L1+L2 
represents the number of longlines in the Segment Switch 
Matrix (SSM) of each segment. The number of SLB input 
lines that are provided for each of the plural SLB's of each 
segment is Kmax, and this number is at least 112.5%, and 
more preferably at least 125% of the expected, maximum 
and parallel, address and/or data path width (e.g., B=64-bits, 
B-96 bits, B-128 bits, etc.). The Level 1 number of inter- 
connect lines, LI is at least equal to the sum of the number 
of macrocell feedback signals (MFB's) and the number of 
I/O feedback signals (IFB's) produced by each SLB so as to 
40 provide 100% local connectivity. The Level 1 multiplexing 
factor, Ql provides at least a 3-way routing flexibility at the 
local level, where such N-way routing flexibility equals 01 
times Kmax divided by LI. Further in Exp. B2, the number, 
L2 of global signals that can be fed from the GSM into each 
SSM is at least as large as LI although L2 can be larger. This 
allows any first segment to forward all of its macrocell 
feedback signals (MFB's) and I/O feedback signals (IFB's) 
by way of the GSM to any second segment. The Level 2 
multiplexing factor, Q2 provides at least a 3-way routing 
flexibility at the SSM to SLB interconnect level, where such 
N-way routing flexibility equals Q2 times Kmax divided by 
L2. CPLD configuring software is thereby given wide lati- 
tude in making routing choices both at the local and global 
interconnect levels. 

The above disclosure is to be taken as illustrative of the 
invention, not as limiting its scope or spirit. Numerous 
modifications and variations will become apparent to those 
skilled in the art after studying the above disclosure. 

By way of a first example, although FIGS. 3A-3B shows 
8 segments, it is within the spirit of the invention to provide 
CPLD devices that have a fewer number or a larger number 
of segments. If a larger number of segments is used, then the 
design of the GSM should be changed (expanded) so that 
routability remains at 3-ways in or better for each GSM- 
feeding line such as 581 and so that all segments have 
symmetrical and uniform access to the GSM resources as is 
depicted in FIG. 7A. 
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